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This dissertation investigates different types of disorder problems by 
using sequential procedures for on-line implementation. The problem is 
considered within the framework of detecting abrupt changes in an observed 
random process when the disorder can occur at unknown times. The focus of 
this work is on quickest detection methods for cumsum procedures 
implemented for different parametric and nonparametric nonlinearities and 
their performance evaluation. Both the non-Bayesian (Maximum- 
Likelihood) and the Bayesian frameworks are presented but the focus is 
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L THE DISORDER AND CHANGE DETECTION PROBLEM 
FORMULATION— AN OVERVIEW 

A. INTRODUCTION 

This dissertation presents sequential decision methods both in the non- 
Bayesian (maximum likelihood) framework and in the Bayesian framework. 
The focus is mainly on non-Bayesian methods, where the goal is to detect, as 
quickly as possible, changes in statistical models of a random process when 
these changes can occur at a random time, while the false alarm rate should 
be lower bounded by some given constant. 

In the classical detection framework such procedures were considered by 
Wald (Wald, 1947), for which the binary hypothesis framework was 
developed under the assumption was that all the observations come from 
one model or from an alternative one. It was not until Page's work (Page, 
1954) in the non-Bayesian framework and Shiryayev (Shiryayev, 1961, 1963, 
1965) in the Bayesian framework that the problem was extended to detecting a 
change from one statistical model to a second model. Lorden (Lorden, 1971) 
showed that the cumulative sum tests as proposed by Page are asymptotically 
optimal when the mean time between false alarm tends to infinity, in the 
sense of minimizing the average delay time for detection. Recently, Poliak 
(Poliak, 1985) proved an optimality property for the Shiryayev rule. 

Two types of problems depend on the time element. The first is the 
disorder problem in which the given observations correspond to one 
statistical model until some unknown time after which the samples 
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correspond to another statistical model. Hereby we will use the notations 
disorder and change as synonyms, even though a disorder is referred to as a 
general change in density which describes the change in the statistical 
behavior of the model, a change will refer most of the time to changes in 
specific parameters like mean variance, etc. The second problem is the 
transient problem in which the disorder decays after some time. In this 
dissertation we will focus only on the disorder (change) problem. 

When a disorder occurs, the random variables we are concerned with are 
the change time and the model parameters after the change. As will be 
presented throughout this dissertation, the detection process refers to 
detecting the change as quickly as possible while ensuring infrequent false 
alarms, while the estimation process refers to estimating the change time and 
the model parameters after the change. This dissertation focuses on the 
detection element. The problem of joint estimation of the change time and 
the model parameters is also addressed and shown to appear in an explicit 
closed form in certain cases. 

The question of where do change detection problems occur is next 
introduced. Three typical situations in which change detection is a critical 
component are considered. The first, in which the detection is used to 
produce alarms during the monitoring of dynamical systems, such as failures 
in sensors (Willsky, 1976,1986), detection of tsunamis and earthquake 
prediction (Nikiforov, 1986), and detection of production failures (Assaf and 
Ritov, 1988). Many more applications in industrial and military 
environments can be considered. Survey papers for fault detection methods 
are given by Isermann (Isermann, 1984), and Gertler (Gertler, 1988). The 
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second situation arises in the area of adaptive algorithms, were the presence 
of abrupt non-stationarities in the signal causes severe errors in adapting the 
gains of the recursive algorithms. Thus, an abrupt change detection 
procedure is needed to improve the tracking capability of the algorithm. For a 
complete survey see Ljung and Gunnarsson (Ljung and Gunnarsson, 1990). 

Finally, the third type of application occurs when the change detection 
algorithm is considered as an integral part of the modeling of a signal or a 
system. The most popular applications are segmentation of speech signals 
using switching parameter methods within AR models (Andre-Obrecht, 1988) 
or various geophysical signals (Nikiforov, 1986). In such cases switching 
methods within the transition matrix of state-space models (Tugnait, 1986), or 
a modified Kalman filter is used to cope with changes modeled as abrupt 
transitions in the measurement matrices (Shumway, 1990). Also, the 
problem of outlier detection by modifying the Kalman filter was introduced 
by Pena and Guttman (Pena and Guttman, 1988). 

B. THE DISORDER PROBLEM FORMULATION 

1. The General Disorder Problem 

The change detection problem is presented within the hypothesis 
testing framework, thus, requiring some statistical knowledge about the tested 
hypotheses which in turn are based upon statistical models of the hypotheses 
before and after the disorder. The model based framework is rich enough to 
serve as a basis for the problem formulation, resulting in parametric type 
tests. As it will be presented later, certain types of change detection 
procedures known as cumulative sum or cumsum procedures are able to 



3 



cope with the parametric and nonparametric forms as well. Within this 
framework four types of change detection problems will be considered. 

Let Ho and Hi be the two (simple) hypotheses, corresponding to two 
possible probability distributions Pq and P\ on the observation space x. If a 
parametric notation is to be used, then the notation P(xl 0o) and P(x\ (?i) or 
Pq(x ) and Pi(x) will be used. The observations x\, xi, ... are assumed to be 
independent random variables. 

Type 1: Classical Binary Hypothesis Testing 

This problem was considered by Wald (Wald, 1947) and can be written as: 

Ho: x ~ Po, 

versus (1-1) 

H V .x~P h 

where the notation "x ~ P" denotes the condition that x has distribution P. In 
this problem there is no time index, hence, no direct formulation of a change. 
Type 2: Disorder Formulation 

This problem was considered by Page (Page, 1954) and can be presented in 
following manner. Let v be the unknown time when the change from Po to 
P i occurred. Let P v denote the probability when the change occurred at the V th 
observation. Let Po denote the probability there is no change, i.e., v = 

The problem can be presented as 

H 0 : X\,X2,... ~ Pq no change 

versus 
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H v : x lf x 2 ,...,x v _i ~P 0 . (1-2) 

change at time v 

*v"-*V+V" ~ ^1- 

If the observation record is finite and equal to say s, the detection problem 
becomes a multiple hypothesis testing, since the test "looks" for at least one of 
the H v (1 < v < s) to hold against Ho. 

Type 3: Transient and outliers formulation: 

Consider two change times v and t such that 



H 0 : Xi,X2,... 


~P 0 


versus 




. X\ f Xjj . . Xy—\ 


0 

a* 

1 


X v , Xv-i, • • •, -Xt-I 


~ Pi 


X T _!,... 


~p 0- 



(1-3) 



The same arguments about composite testing can be applied here. Notice that 
this framework can be extended to the so-called multiple disorder problem, in 
which the observations x T , x r +], ... ~ Pi (Pi being another probability density on 
the observation space). 

Type 4: Initial Condition Disruption 

For model based detection schemes based upon state-space, ARMA, etc., the 
initial condition is a part of the statistical model. Hence, besides the ordinary 
way to model the statistical change as a change from Po to Pi, a certain class of 
changes can be modeled as a result from changes in the initial condition. 
This problem is also time related since the change might occur at an 
unknown time. 
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Once Hi is decided, i.e., disorder detected, further questions arise, such 
as estimating the change time v, possibly to estimate do and 6\, and in some 
cases to diagnose which type of change actually occurred. Thus, the detection 
and estimation following the detection problems being two separate issues 
can be coupled, but it is important to distinguish between them. 

Both off-line ( n fixed) and on-line ( n growing) algorithms can be 
designed for solving such types of problems, and as shown in the sequel differ 
substantially, both from the change detection formulation and from the 
performance evaluation point of view. 

2. Solution Methods 

The solution for such problems is a function of several factors. 

a. Off-line versus On-line Tests 

In the off-line formulation, a given finite record is given 
X\, xi , ..., xj and a test statistic gj =g(x xi, ..., xj) > A has to decide whether or 
not the change occurred. In the on-line formulation, the test statistic gt = 
g(X], X 2 , •••, x t ) - ^ has to reach a decision the first time when gt exceeds a 
threshold A. 

b. Criterion 

For the classical detection problem (1-1), the criteria in the sense 
of Neyman-Pearson (Ghosh, 1970), is based on a test which maximizes the 
power or the probability of detection (the probability of deciding H] when H\ 
is actually true) subject to the constraint that the size or the false-alarm 
probability (the probability of deciding H\ when Ho is true) is less than or 
equal to a given value. 
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As seen from equation (1-2), in the off-line framework, the 
change detection problem involves multiple hypotheses testing, for which 
the Neyman-Pearson lemma is not valid (Ghosh, 1970). Therefore the test in 
this case cannot be defined as one of maximizing the power since H\ is not 
reduced to a simple distribution but a set of distributions. In such cases, the 
best property for a test is said to be Uniformly Most Powerful (UMP), i.e., tests 
which have the highest detection probability for each distribution of the 
alternative hypotheses H\. Therefore no UMP tests exist for change detection 
problems. In this case, those UMP properties can be recovered by using 
asymptotic analysis (Deshayes and Picard, 1986). In order to cope with the 
performance analysis of test statistic functions the following definition is 
needed. 

Definition: Stopping time. Let x\, xi, ... be the sequence of independent 
random variables. The nonnegative integer valued random variable N is 
said to be a stopping time for the sequence if the event [N = n} is independent 
of X n +\, X n+ 2/ •••• 0 

Hence, the event {N = n) corresponds to stopping after having observed 
x \, ... x„ and thus must be independent of the values of the random variables 
yet to come (Ross, 1989). 

For on-line processing, the criteria is modified. Notice that by 
using the formulation (1-2) for a large enough number of observations, the 
change will be detected with probability one. Thus, a natural criterion should 
be the delay for detection, subject to the constraint that the size of the test is 
upper bounded by a given threshold (Page 1954, Shiryayev 1963). Lorden 
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(1971) and Nikiforov (1983) use a slightly different version of the delay for the 
on-line problem: 

Let S n denote the test statistic at time n. Let N be the stopping 
rule, and let X n be a generalized threshold. Then: 

N = inf{n: S„ > X n ) (1-4) 

defines the stopping rule and stopping time. See Figure 1.1. 




Figure 1.1. General Characteristics of the Detection Model. The observation 
sequence {*„} is transformed into a sequence {S n }. A change in the model 
structure of {*„} results in a cumulative departure of {S„}. The change is 
detected by comparison of {S n } with a generalized threshold {Xn) 

(from Segen and Sanderson, 1980). 
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The worst case average delay D (Lorden, 1971) is defined by 

D= e|n|HiJ = su^>esssupE v |(N - v+l) + |xi,X2,---,*v-l} (1-5) 

where ( a) + = max(O^), 

where E v denotes the expectation of the change time under the probability 
law P v , where P v denotes the distribution of the sequence x\, X 2 , ..., under 
which x v is the first term with distribution P\. In other words, D is the 
smallest value such that for any v = 1, 2, ... 

E V {{N - v+ l) + |x 1 ,x 2 ,...,x v _ 1 J < D 

meaning that this "minimax" type criterion defines the best worst case for 
delay. 

Thus, the criteria is defined in terms of the quickest detection of a 
change subject to the constraint that the size of the test is upper bounded, i.e., 
the desire for large mean time between false alarms T, where T is also defined 
in terms of the stopping time 

T = E{N|H 0 } (1-6) 

which denotes the expectation under the no-change hypothesis Hq. The pair 
( T,D ) will specify the performance of a given algorithm. 

Notice that in the transient or multiple disorder setting of the 
equation (1-3), a fast detection is necessary since if f-v+l<D the transient 
cannot be detected. 

Thus, for the on-line framework, this natural criterion should lead to 
the optimal stopping rule, and the question that arises: are there test statistics 
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which are optimal in that sense? A positive answer will be presented in the 
sequel. 

Different types of criteria can be used for deriving optimal stopping 
times for change detection, see Bojdecki and Hosza (Bojdecki and Hosza, 1984) 
and Pelkowitz (Pelkowitz, 1987). 

For the off-line problem, this question is more difficult, because as 
was shown in equation (1.2), change detection problems are multiple 
hypotheses problems for which there exists no optimum test in the classical 
sense of power, (Neyman-Pearson lemma), hence, no UMP tests exist. In 
such situations, an asymptotic analysis for which UMP tests can be recovered 
is of interest. Deshayes and Picard (Deshayes and Picard, 1986) showed that 
UMP tests exist for likelihood-oriented methods in the sense of large 
deviation asymptotic analysis. (Sample size goes to infinity.) 

c. Optimal Stopping Rules 



it was shown that optimality exists only in the sense of asymptotic analysis. 
For the on-line point of view, in the non-Bayesian framework, the only 
optimality results are given by Shiryayev and Lorden. Lorden (Lorden, 1971) 
showed that for some constant y, the stopping rule N must satisfy: 



The speed in which a stopping rule detects a (true) change of distribution is 
evaluated by (1-5) 



The off-line point of view was addressed in the last section where 



E 0 {N} = E{N|v = -}>}-. 
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Lorden showed that a certain class of stopping rules is asymptotically (y -»°°) 
optimal, and that the cumsum procedure (Page's test which can be described 
as repeated sequential tests) belongs to this class. 

In the Bayesian framework, Shiryayev (Shiryayev, 1968, 1978) 
solved the problem. He considered a cost function whereby one loses one 
unit if N<v, and loses c units for each observation taken after v if N> v. The 
prior on v is assumed to be geometric. Shiryayev showed that the stopping 
rule prescribes stopping as soon as the posterior probability of the change 
having occurred exceeds a fixed level. 

d. Use of Prior Knowledge 

For change detection problems, prior knowledge can be useful in 

two cases: 

The first case is related to the problem of estimating the change 
time after detection. From the Bayesian point of view, the knowledge of the 
statistical nature of change time makes up the prior needed for such a test. 
Such knowledge on the distribution of the change time (or initial conditions) 
will assist in the quickest delay detection, i.e., estimation of the time change. 
In the non-Bayesian approach this is equivalent to assuming a uniform prior 
distribution over the observation set, resulting in a detector which computes 
the likelihood function for all possible disorder times. 

The second case is the estimation following detection of the 
statistical model after the change of the parameter set 6\. For test procedures 
implemented on line, the use of prior knowledge on the parameters set 6q, 9 1 
e © improves the quickest detection since in such situations, only a short 
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sample is available from the true change time to the detection time, thus, it is 
difficult to identify 6 \. 

In this context, we shall consider two different forms of the prior 
on the distribution after the change. The first form of prior uses the 
composite hypothesis testing framework. As an example, the Darmois- 
Koopman family of distributions (Govindarajulu, 1975, Siegmund, 1985) 
which is presented in the sequel, allows suitable parametric tests, using the 
assumption that the statistics after the change have a form of a one parameter 
exponential distribution. The second form of prior uses the popular method 
of multiple models whenever the set of parameters 6 \ is finite. Such 
methods can be found in the literature (Anderson and Moore, 1979). 

The problem of detecting the change time and estimating the 
statistical model after the change is a difficult task because of the reasons 
given. Except for cases where the solution to the detection-estimation can be 
made explicit, like estimation of the jump amplitude in the case of additive 
changes in Gaussian linear models (Willsky and Jones, 1976), the combined 
detection-estimation solution cannot be shown in a closed form. This point 
is further discussed in Chapter III when the generalized likelihood ratio 
algorithm (GLR) is applied to linear models. 

This dissertation focuses on the methods of the quickest detection 
problem which provides in the case of detection of jumps in the mean, a 
convenient way to estimate the unknown jump. However, it will be shown 
that a lot of complicated problems like changes in spectral properties or 
eigenstructure (changes in State Space models, AR models or ARMA models) 
can be transformed to changes in the mean of a statistic function g n , enabling 
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the use of quite easy detection schemes to detect rapid changes in the 
dynamics of the signal model. As shown in the sequel, such detection 
algorithms are based on the cumsum procedure which provides a tradeoff of 
computation efficiency and complexity. 

3. Performance Evaluation 

In the off-line processing, the process is observed only over a finite 
interval, hence only a finite number of samples is used. The problem is then 
considered as that of classical hypothesis testing (1-1). In this case the 
performance is measured in terms of probability of detection versus the 
probability of false alarm. 

In the on-line processing, the approach of "quickest detection" is 
adapted as the performance criterion used in sequential analysis. This 
approach is used by Nikiforov (Nikiforov, 1979, 1980). For this setting, the 
terms run length and average run length (ARL) will be used in order to 
determine the number of observations needed to reach a detection decision. 
This function will be shown to be the main tool in the performance 
evaluation of the test procedures. The first time the test statistic, i.e. the 
stopping rule (statistic used to determine the change) crosses the pre- 
determined threshold according to desired performance, is called the stopping 
time or sometimes also the Markov time (Shiryayev, 1978). 

C MODEL BASED METHODS 

In designing the change detection/ estimation algorithms, the philosophy 
developed in Chow and Willsky (Chow and Willsky, 1986) distinguishes two 
tasks which are depicted in Figure 1.2. 
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The first task is the generation of change indication signals (residuals) 
sometimes also called error signals. These signals are designed to reflect the 
possible changes in the measurements or models and to make a subsequent 
detection possible. These signals are designed to have a certain mean (usually 
zero) and a white noise correlation signature when no change occurs. This is 
referred to as the “white noise" hypothesis. In general the mean value or 
spectral properties change under a disorder. 

The second task is design of the stopping rules (or decision rules) based 
upon these residuals. 

Sometimes an additional task diagnostics is added. This is the problem of 
estimating the origin of the change (for example: which pole location 

changed). A broad class of change detection methods makes explicit use of a 
mathematical model of the observed system or signal. For example, the 
setting of the system or signal in a state-space form enables the use of Kalman 
filtering methods to generate the residuals (innovations in this case). This 
twofold problem will be presented next. 
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Figure 1.2. Model Based Change Detection Scheme as a Twofold Problem 

(from Gertler, 1988) 
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1. Generating the Change Indication Signals (Residuals) 

As shown in Figure 1.2, modeling is an integral part of the change 
detection process, usually for creating "white" residuals under the "no 
change" hypothesis. Using the state-space setting, residuals may be generated 
in a number of different ways, which will be presented briefly. 

cl Straight Input-Output Residuals (Gertler, 1988) 

Given the state-space model 

x(n+l) = A x(n) + Bu(fi) 

y(n) = Cx(m) 

an equivalent input-output model can be presented by using the shift 
operator with matrices G(z) and H(z), z being the shift operator and H being a 
diagonal matrix: 

H(z) • y(n) = G(z) • u (n) 

where 

G(z) = C[adj(Iz-A)B] 

H(z) = det (Iz-A)I. 

Defining 

q(«) T = [u(n),y(n)] T 
F(z) = (G(z)-H(z)] 

then the input-output equation can be written as: 

F(z) • q(n) = 0. 
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Consider now the model matrix F(z) which represents the discrepancies 
between the input-output models G(z) and H(z), and the true system G(z) and 
H(z): 



F(z) = [G(z), -H(z)] 

where 

G(z) = G(z) + AG(z,t); H(z) = H(z) + AH(z,t). 

Such discrepancies may account for plant faults or changes. Applying this 

a 

equation to the measurements q(n) with the model matrix F(z) yields the 
residuals vector e(n): 

F(z) • q [n) = e(n). 

b. Filtering and Parameter Identification Methods 

A popular solution (Willsky, 1976) consists of monitoring the 
innovations or the prediction errors, using estimation filters or parameter 
identification methods. Using the optimal state estimator, the Kalman filter 
is designed according to the "normal mode" or no change situation. If prior 
knowledge is known about the change or if a diagnosis is required in addition 
to detection, a possible solution consists of using a bank of Kalman filters 
designed according to all the possible models for each hypothesis (see Figure 
1.3). Notice, that the Kalman filter produces under the null hypothesis zero 
mean and independent residuals. Consequently deviations from this 
behavior are indicators of change. However, in some practical problems, it 
may be necessary to monitor a function of the innovations rather than the 
innovations themselves (Basseville, 1988). 
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Figure 1.3. Filtering Methods for Generating the Residuals. 

(a) "normal mode" filter 

(b) generating error signatures due to possible change hypotheses 

In identification-based methods, a residual quantity is defined in 
relation to the plant parameters. The plant is identified in a fault-free 
reference situation, then repeatedly on line. The results of the latter, are 
compared to the reference values and a parameter error (residual) is formed. 
Remark: these model-based methods do not include explicit model switching. 
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In Chapter VI such methods will be described, thus enabling us to modify the 
Kalman filter to detect the change. 

c. Redundancy Methods 

These techniques are used primarily for failure detection (sensor 
failures). Two classes can be distinguished. The first class is direct or physical 
redundancy. Using several identical sensors measuring the same quantities, 
the differences between each possible pair may reflect a change. These 
residuals are processed using voting methods (Willsky, 1976). Another 
approach consists of searching subsets of measurements for inconsistency, 
thus indicating changes. 

The second class is indirect or analytical redundancy. This 
method monitors of all the existing relationships between the inputs and the 
outputs that are zero under the hypothesis of no change exists. These 
techniques were studied by Deckert (Deckert et al. 1977), Chow and Willsky 
(Chow and Willsky 1980, 1984), and others. 

2. Statistical Testing (stopping rules) 

The resulting residual vector contains the combined effects of the 
changes and the noise (as well as the modeling errors). Two approaches can 
be considered. 

The first consists of the deterministic modeling of the changes. (It is 
important not to confuse the random nature of the change time and (usually) 
the change magnitude with the deterministic modeling of the change). For 
example, consider the case: 

X n = 6 n + n n n n ~ N(0,<t 2 ) 
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where 



e n = \ 



00 

01 



1 < n < v-l 



n> v 



v being the change time (random). Therefore, the effect of changes on the 
residuals has to be separated from that of the noise. This is done by statistical 
testing, making use of the assumption of the the non-changing statistical 
structure of the noise, versus the changing statistical nature of the 
observations (change in mean, variance, etc.). 

In the second approach the observed changes in the time series are 
modeled in a statistical manner. Therefore, the noise is part of the modeling. 
Hence, the statistical nature of the changes can be modeled as changes in the 
noise characteristics. 

Several testing methods will be described briefly, while the main part 
of the dissertation will focus on a subset of them. 



a. Compound Scalar Testing (x 2 -type off-line test) 

Consider a single scalar test statistic 

e T (n)-S~ l -e{n)Z A. 

Ho 

where e(n ) is the residual vector and S e is the covariance matrix of the vector 
e. Then, under the no change hypothesis, the residual vector e(n ) consists of 
normal i.i.d. components. Hence, the threshold A follows a chi-square 
distribution with p degrees of freedom (p being the vector size or number of 
residuals). Recursive chi-square tests are also available. 
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b. Likelihood-oriented Methods 



The likelihood ratio approach is a general tool for change 
detection. Different methods can be considered (Ghosh, 1970; Willsky, 1980). 
For example, consider a test which compares the hypothesis Hi of nonzero 
residual mean to the null hypothesis Hq of zero mean. The decision is based 
on the likelihood ratio between the joint distributions of the residuals 

° 8 P{e(l),e(2) e («)|H 0 } 




The numerator and the denominator, respectively, are the Probability 
densities of the observed time series under the two hypotheses. If the 
residuals are independent, then (1-7) is easy to compute. Under the 
hypotheses testing given by (1-2): 



S,”(e) = logn 






1=1 P o( e i) 




P iM 
PoW 



If the residuals monitored are the innovations of a Kalman filter, then it can 
be shown (Anderson and Moore, 1979) that the distribution of these 
innovations is given by the conditional distribution of the observations x; 
(conditioned by their past values), hence, (1-7) can be written in the general 
form 



s l=Z lo S 



i=l 



P l(*il*i-1,- 
P o(*z |*i-l,- 



-,*o) 

-,*o) 



( 1 - 8 ) 



This kind of test is called cumulative sum test (or cumsum test) and can be 
written as 
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(1-9) 



s f = £«(*;)• 

i=k 

where 

x, ' =tog 77717 7T 

Notice that in this case the computation of S" is recursive. 

The tests based on (1-8), (1-9) are stopping rules (i.e., tests which 
enable us to estimate the change time v), based upon the knowledge of the 
parameterized densities before and after the change. In this case the estimated 
stopping time can be found by using the maximum likelihood estimate (MLE) 
under H\, namely 

v = argmaxS”. (I - 10) 

l<v<n 

In general, the statistical properties after the change (i.e., using the 
parameterized format of P] as 6\) are not known. Hence, the cumsum test 
(1-9) can be used to reach the change decision 

max maxS”((?O/0i)^ A. (1-11) 

l<v<n 6 ! H 0 

This test is called the generalized likelihood ratio (GLR) test (Willsky and 
Jones, 1976) and involves a double maximization of high computation cost. 
Only in special cases like additive changes in linear systems modeled in the 
state-space representation, it can be shown (Willsky and Jones, 1976) that the 
effect of the resulting changes in the innovation vector e„ are also additive. 
Therefore, in the case of Gaussian state and observation noises, there are cases 
for which explicit solutions for 6\ exist. For example, if 0\ represents the 
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mean after the change. Then, the maximization over 6\ is explicit (Basseville, 
1988), resulting in joint estimation of the vector (v, #i) by recursive 
computation of S" and b n . 

The theoretical optimality of the GLR has been investigated 
recently (Deshayes and Picard, 1986) from the off-line point of view. They 
show that under asymptotic exponential decay rates of the error probabilities 
a , /} (where a is the Type 1 error probability or the false alarm probability and 
similarly /J is the Type 2 probability or the probability of detection) and for 
specific families of distributions, the GLR tests are UMP. 

Remarks 

• The stopping rule based upon a cumsum statistic can use any general 
nonlinearity g(-). For example, instead of the probability ratio of 
conditioned observations as in (1-9), a probability ratio of the 
observations Xj can be used. In this case 

sW=ios SS' 

• Both off-line and on-line implementations (using "sliding" windows) 
can be used. Examples for using this method for ARMA and AR 
models can be found in the literature (Segen and Sanderson, 1980, 
Basseville, 1986, and Basseville and Benveniste, 1983). 

c. The Statistical Local Approach 

This approach is used in order to overcome the main drawback of 
the GLR test, namely its computation cost due to the double maximization. 
This approach was introduced by Nikiforov (Nikiforov, 1986) for on-line 
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detection of changes in spectral characteristics of ARMA models. The original 
idea consists of looking for small changes in models and using a special type 
of Taylor's expansion of the log-likelihood function. Thus, the nonlinearity 
g (•) becomes 




e = e 0 - 



( 1 - 12 ) 



Deshayes and Picard (Deshayes and Picard, 1986), showed that for the statistic 
g(x n ) there exists a central limit theorem. Any change in Q is reflected as a 
change in g(x n ), for which stopping rules based on cumsum tests can be 
designed. 

d. Bayesian Oriented Methods 

Bayesian oriented methods are based upon some prior statistical 
knowledge on the change time, or uses some knowledge on the switching 
model used to describe the statistical behavior of the changes. The use of 
hidden Markov models to describe the changes in state-space models 
(Shumway, 1990) is very popular, and leads to some change detection 
algorithms. However, in the Bayesian framework, it is very difficult to find a 
general solution because of the use of different cost functions or different 
prior assumptions. As mentioned in Section B.2 of this chapter, Shiryayev 
(Shiryayev, 1977) introduced a Bayesian competitor as an alternative to the 
Page cumsum test. Recently, Poliak (Poliak, 1985), proved an optimality 
property for the Shiryayev-Roberts rule. 
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e. Heuristics Associated with a Two-model Approach (Basseville, 

1986) 

This method called the “two models approach" is in fact a 
simplification of the GLR test. Implementation of GLR tests leads to 
“boundary" problems, because models are not very reliable when identified 
on short segments. In order to overcome this problem, the two model 
approach was introduced. These algorithms are less efficient than likelihood 
ratio methods but more efficient than the tests based upon the local approach. 

D. ORGANIZATION OF THE DISSERTATION 

This dissertation focuses on the on-line analysis of detection algorithms, 
hence, the quickest detection methods are explored. Both the non-Bayesian 
and Bayesian points of view are investigated but the focus is on non-Bayesian 
(maximum likelihood) methods. In this context, sequential analysis and a 
certain type of cumulative sum procedures which form a generalization of a 
test first studied by Page (Page, 1954) to detect a change in the distribution of 
random variables observed at random times are investigated. The Bayesian 
point of view is also included. Shiryayev (Shiryayev, 1978) results are shown 
to play a key role in any Bayesian approximation. 

Different disorder types (Type, 2, 3, and 4) are investigated throughout the 
dissertation in the sequential (on-line) detection framework. 

The body of this dissertation is divided into four groups as shown in 
Figure 1.4. Chapters II and HI form the maximum likelihood solution of the 
detection problem while Chapter V presents the Bayesian approach. 
Chapter IV provides additional tools to analyze the performance of both the 
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non-Bayesian and Bayesian methods by using diffusion type approximations. 
Finally, Chapter VI presents a MAP estimator to a Type 4 problem, namely, 
discontinuity type disorder. 
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Figure 1.4. Sequential Methods for Quickest Disorder Detection 

Each chapter includes an introduction and a summary section which will 
assist in relating all the topics presented throughout this dissertation. An 
appendix which summarizes the basic concepts of hypothesis testing and 
detection theory is also provided. 
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II. SEQUENTIAL METHODS FOR QUICKEST DETECTION OF CHANGES IN 
PROBABILITY: THE NON-BAYESIAN FRAMEWORK 



A. INTRODUCTION 

Consider the observation process {x n } with probability density Pg(x n ) or 
conditional probability density P$(x n \ x n -\, xq) depending upon an 
unknown parameter 9. This parameter can describe two different situations: 
In the first situation, 9 can be for example, the mean or variance of the 
density of the time series, and will reflect directly the statistical properties of 
the time series. In the second situation, using some convenient 
parameterization of a system or signal denoted by 9, i.e. the state-space 
representation or ARMA modeling, 9 describes the dynamics of a system or 
signal. 

In the context of detecting jumps (sudden changes) in the parameter set 9, 
we are interested in detecting changes in the dynamics, or in the statistical 
properties of complicated structures. 

Since the jump time is unknown, the problem is twofold: detection of 
the change, and estimation of the change time. In this chapter we will focus 
on the detection problem only. As shown in the last chapter there are 
different issues that must be addressed: on-line versus off-line 

implementation, parametric versus non-parametric methods, etcetera. These 
issues were briefly presented in Chapter I, and are investigated in more detail 
in the context of change detection in this chapter. In particular we will be 
addressing the following points: 
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1. Off-line versus On-line Viewpoints 

Even though the final goal is to implement on-line (sequential) 
procedures, the off-line viewpoint is significant, since it can be used to derive 
on-line tests. This point will be clarified in this chapter. These two 
viewpoints differ in: (a) problem formulation and (b) performance evaluation 
as related to different criteria. 

In the off-line formulation, the change detection problem is 
implemented as multiple hypotheses testing, for which the Neyman-Pearson 
lemma is not valid so that no UMP tests exists. Thus, the criterion from this 
viewpoint is that of classical detection problems, namely: size and power of 
the test. 

In the on-line formulation, the criteria is modified to detect a change 
in the parameter 6 as quickly as possible. In the on-line point of view the 
detection is performed by a stopping rule of the general form 

N = inf {n : S n > X) 

S n being an appropriate test statistic (see Chapter I). 

The performance of a stopping rule is evaluated by T the mean time 
between false alarms (1- 6), and by D the delay for detection (1- 5) as proposed 
by Lorden (Lorden, 1971). This is a "minimax" type of average delay referred 
to as the best least favorable change time. 

The difference between the off-line and the on-line viewpoints is 
significant: whereas no optimal test does exist in the off-line framework, 
optimal stopping rules do exist in the on-line framework for independent 
identically distributed (i.i.d.) sequences with known distributions before and 
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after the change. Moustakides (Moustakides, 1986) extended this result to the 
non i.i.d. case. Since in this chapter we take the non-Bayesian approach, 
another difference is viewed: In the off-line processing we assume a uniform 
prior distribution over all the observation set, resulting in a likelihood 
detector which computes the likelihood for all possible disorder times, 
whereas in the on-line approach, the disorder time is assumed to be an 
unknown parameter. Lorden showed (Lorden, 1971) that a certain class of 
stopping rules called cumulative sum tests (cumsum) are optimal in the 
sense of his criteria. The cumsum tests form a rich enough family of tests, 
and are the focus of investigation of this chapter. In particular, the test called 
the Page-Hinkley stopping rule is investigated in depth. 

2. Composite Testing 

As mentioned earlier, optimal stopping rules do exist in the case of 
i.i.d. sequences with known distributions before and after the change. When 
the distribution after the change is not known, a composite framework needs 
to be used. This issue is addressed by using the Darmois-Koopman 
Distribution for a one parameter exponential family. 

3. Parametric versus Non-parametric Methods 

The nonlinearities or transformations g( ) used for the cumsum 
detection procedures (1-9) can have a parametric or non-parametric form 
(sign, rank tests, etc.). The analyses will provide a general framework which 
can be used for either type. 
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B. ORGANIZATION OF THIS CHAPTER 

The main goal of this chapter is the analysis of sequential methods for 
change detection, namely, the cumsum procedures and in particular the Page- 
Hinkley stopping rule. The purpose is to set a general framework in which 
the transformation (nonlinearity) used can be of a general form (different 
parametric and non-parametric forms). Thus, the following two sections (C 
and D) can be considered as a "guided tour 7 ' through theorems and results 
needed to understand and analyze cumsum procedures and their 
performance (presented in Sections E and F). 

In Section C, sequential tests known as one-sided and two-sided Wald 
tests are presented in the classical detection formulation. Some basic 
theorems (Wald identity) which are shown to be important for the general 
disorder or change detection are presented. 

In Section D, the sequential tests implemented with the log-likelihood 
function known as the Sequential Probability Ratio Tests (SPRT) are 
presented. Optimal properties of these tests are shown. Performance 
evaluation of the one and two sided SPRT, known as Wald approximation 
are analyzed. Within this framework, composite SPRTs using the Koopman- 
Darmois family of distributions are presented. Basic performance measures 
in the presence of strong and weak changes are shown. 

In Section E, we introduce the cumsum stopping rules in the on-line 
framework (using Lorden's criterion). The Page test is presented and shown 
to be as a repeated one-sided Wald's test. Both the off- and on-line 
viewpoints are presented. Observing the renewal property of cumsum tests. 
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using Ladder variables and results from queueing theory, new aspects of 
cumsum tests are addressed. The Page test is also shown to be a maximum 
likelihood detector. Finally, optimal properties of the Page tests are presented; 
this test is shown to be optimal in the sense of Lorden criteria. 

Section F presents the performance evaluation of Page's test. The run 
length function is shown to be the primary tool needed for the analysis of 
delay and average false alarm rate of the test. Using the results in Sections C 
and D we derive two results known as Lorden's and Wald's approximations. 
Finally, the asymptotic performance framework is introduced and used for 
two important results: first, the asymptotic approximation of the run length 
function, and second, the generalization of Lorden's results to general 
nonlinearities, other than the log-likelihood transformation used in the Page 
stopping rule. A general framework of asymptotic performance evaluation of 
Page's test is provided. The resulting measure is shown to be used for any 
nonlinearity in the presence of various noise distributions. 

Section G presents a short summary of the main results of this chapter. 

G SEQUENTIAL TESTS 

An alternative approach to the fixed size tests is to fix the desired 
performance and allow the number of measurements to vary in order to 
achieve this performance. 

To formulate the problem, suppose that the observations {x*: k = 1, 2, ...} 
are i.i.d. and distributed according to 

H 0 : Xjt ~ P 0/ * = 1,2,... 

versus (2-1) 



30 



Hr. X k ~P 1; A = 1, 2, ... 



where Pq and Pi are two possible distributions. A sequential test is defined by 
the pair of indicator sequences (<P4) where: 

0 = [fa: k = 0, 1, 2, ...} is the stopping rule indicator, (fa 9\ n -> {0,1}), 

d = A: = 0, 1, 2, ...} is called the terminal decision rule. 

For an observation sequence (x k : k = 0,1, 2, ...} the rule (<p4) makes a decision 
d(x i, *2, •••/ *n) whether or not any change occurred. In particular, sequential 
tests can be described as follows: Continue sampling as long as 

<p(x i, X 2 j •••, x n ) = 0, and stop when fax\, xi, • .., x^j) = 1. We define two kinds of 
tests: two-sided and one-sided. 

The two-sided sequential test is based on the definition of the cumulative 
sum: 



Sn = 



i=l 



( 2 - 2 ) 



S 0 = s 



where g: is a memoryless function of the observations, and s is called 

the initial score. 

We detect a change according to the following stopping rule: 



<p n {x v x 2/ ...x n ) = 



0 

1 



if S n e ( a,b ) continue 
if S n e(a,b) stop 



(2-3) 



where a, b are the stopping thresholds; b < 0 < a. 

Also, the terminal decision indicator is given by: 
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d n {x v x 2 




0 if S„ < b no disorder 

1 if S n > a disorder . 



The stopping time N (sometimes called the sample size or the run length 
of the test) is defined as: 



In classical detection theory, a is defined as the probability of false alarm, and 
l 3 is the probability of miss. In terms of hypothesis testing the acceptance zone 
zv a is defined as the zone where x k e w a or Xk ~ Pq. The rejection zone w r is 
defined by Xk e vo r or Xk ~ P i (disorder zone). The indifference zone W{ is 
defined by Xk e &-zv a -zv r (See Figure 2.1). 

The one-sided test is defined by letting: b — » 



N = inf {w: S n c ( a,b )} 



and the exit times are defined by: 



N fl = inf{n: S n > a } 
Nf, = infjn: S n < fc} 



(2-4) 



The error probabilities for the two-sided tests are defined as: 



a = Pr{S N > a\H = H 0 } 
p = Pr{S N <b\H = Hi}. 



<p n {x i,x 2 ,...x n ) = 



0 if S n < a continue 

1 if S n > a stop 



and 



(2-5) 




disorder 
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and the stopping time is given by 



N = inf{n:S„ > a} 

= 00 S n < a. 

The error probabilities are defined in this case as: 

a = Pr{S N > a|H 0 } 

P = Pr{Sfj 




Figure 2.1. Two-sided Sequential Test 



1. The Fundamental Identity (Wald's Identity) of the Sequential 
Analysis 

This identity forms the basis of subsequent analysis for the Operating 
Characteristics (OC) and ARL functions of a Sequential Test (ST). It gives a 
convenient way to derive the moment of the sample size required to 
terminate the ST. 
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Theorem (Wald, 1947): 



Let x\, x 2/ •••/ be independent random variables and let 
S n = ^ et b, S„ ) be any sequential test of Ho: 0 = 6q against H\: 

6 = &\ based on i.i.d. {g(x n )}, and let N be the stopping time for this test. 

Let \f/i(h) denote the moment generating function of the random 
variable g(x) under the hypothesis Hf. 



The proof can be found in Ghosh (Ghosh, 1970, p. 208) or Feller 
(Feller, 1971, p. 603). 

2. Applications of Wald's Identity 

As a direct corollary to Wald's identity, immediate results for the 
ARL function can be obtained: 



then, the average run length (ARL) is given by Govindarajulu 
(Govindarajulu, 1975): 




i = 0,l 



for every real h for which <//,(/:) is bounded. Then, if P(g(x) = 0 |h,) < 1 and 
f’(lsWI < °° I Hi) = 1/ we have: 





define z = g(x), 
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(2-7) 



Bounds on the stopping thresholds can be associated with the ST(a, b, g) 



a < log 



l-l 

a 



b > log 



P 

1 -a 



( 2 - 8 ) 



The strict equalities hold if and only if b < 0 < a, and in terms of the error 
probabilities for I a, b I >0: 



a <(l-p)e a 



p < (1 -a)e b . 



(2-9) 



These approximation are known as Wald approximations and were derived 
by ignoring the “excess over the boundaries" (Siegmund, 1985). Notice that 
we can get yet cruder inequalities when we consider the asymptotic case 
where a i 0, ft i- 0. Then: 

o < P < e . 



3. Comparison of Sequential Tests (ST) and Fixed Sample Size Tests 
(FSST) 



Our object is to investigate the number of samples saved by an 
ST (a,b,g) over the corresponding optimum FSST, both designed to achieve 
the same performance ( a,P ). 

The relative efficiency of STUAg) at 6 is defined (Ghosh, 1970) by 



RE(6) = 



nj^P) 

E{N\0 } 
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where n{a,p) is the sample size required by FSST test and E(N|#} is the ARL 
function of the ST test, both designed to achieve the same performance (a,/?). 
It can be shown (Poor, 1988) that for the case of a simple sequential detection 
of a constant signal in the presence of white Gaussian noise, using a 
likelihood ratio detector for both the ST and the FSST, the limiting RE is 
given by 

lim RE = 4. 
a=f}-> 0 

Thus for vanishingly small error probabilities (with a = ft) the SPRT requires 
on the average only one-fourth as many samples as does the FSST test. 
Further discussion can found in Ghosh (Ghosh, 1970). 



D. SPRT TESTS 



When the test procedure given by (2-2 and 2-3) uses the the log-likelihood 
ratio as the nonlinearity g(*) 



g(x) = log 



dP{x I0j) 
d?(x\0 o ) 



the sequential test is called the sequential probability ratio test (SPRT). The 
relation between any ST(a,fc,g) to SPRT (A,B) is given by 

A = e a B = e b 



w'here 



b <0 < a and 0 < B < 1 < A. 

The bound approximations (2-8), (2-9) can be converted to SPRT test terms by 
placing e° = A and e b = B. 
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The SPRT test has a fundamental property which is extremely important 
and will be used in the sequel (Therrien, 1989): 

under disorder: E{g(xi) | > 0 

under no disorder: E{g(x,) | (^} < 0. 



( 2 - 10 ) 



In the following sections, several properties of SPRT test will be 
presented, and the two- and one-sided SPRT tests will be analyzed, followed 
by the composite hypothesis framework for SPRT. 



1. Optimal Properties of SPRT 

For testing a simple hypothesis against a simple alternative with i.i.d. 
observations, the SPRT test is optimal among all sequential and fixed sample 
size tests in the sense of minimizing the expected run length both under H 0 
and under H 1 among all the tests having no large error probabilities. The 
following theorem establishes this result. 



The Wald-Wolfowitz Theorem (1948): 

Among all tests (FSST and ST) for which Prfaccept H\ | Ho) < a, and 
Pr{accept Ho I Hi) < and for which E{N | i = 0,1; the SPRT with error 

probabilities a and (3 minimizes both E{N | 0o) and E{N \ d\). □ 

The proof can be found in (Ghosh, 1970). The optimal property of the 
SPRT test can be viewed as analogous to the Neyman-Pearson lemma. 



Definition (Wijsman, 1960): 

A SPRT is said to have a monotonicity property if when the upper 
stopping bound of the SPRT is increased and the lower bound is decreased, 
then at least one of the error probabilities decreases. 
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Theorem (Lehmann, 1959) 

Let x\, *2/ ••• be independent random variables having probability 
density P(x;B) which has monotone likelihood ratio. Then any SPRT for 
testing Ho : 6 = Bo against Hy 6 = B\ (Bo < B\) has a nondecreasing power 
function. □ 

The proof can be found in Lehmann (Lehmann, 1986). 

2. The Termination Property of SPRT 

The SPRT test is a closed test if and only if, the termination property 
holds for every B e S. When g(x,) are i.i.d. any SPRT is closed under the 
following mild restriction (Poor, 1988): 

suppose that for any B e 0, g(xi ) are i.i.d. random variables and 
P(g(xj) I B) < 1, then: 

• lim Pr{/7 > N|0} = 0. 

n—>°° 

• there exists a fo>0 such that the moment generating function E{e nl \ 0} 
exists for all real t< to- 

This means that the entire statistics of n can be found. The result is 
that the SPRT or the ST([j3/(l-a)],[(l-j3/a])) based on Wald's approximations 
are always closed. Ghosh (Ghosh, 1970) extended the result to the situation 
g(xi) are not i.i.d. 

Another optimal property of the SPRT was shown by Wald which 
established a lower bound on the ARL of competitors of the SPRT. This 
result will be presented in the sequel (2-24) when presenting the problem of 
composite hypothesis testing. 
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3. The Operational Characteristics (OC) and ARL Functions of Two- 
sided SPRT Tests 

The use of Wald's identity (2-6) forms the basis of certain bounds for 
the OC function Q(6) and ARL functions for the SPRT. 

Wald's Approximations 

Wald's approximations are based on the use of the moment 
generating function of g(x) (2-6) provided that we can find two nonzero real 
numbers ho and hi such that 

V i (h i ) = E{exp(h-g(x))\e i } = l i = 0,1. (2-11) 

Existence and uniqueness of such roots are guaranteed when g(x) has a 
nonzero mean and satisfies certain other conditions (Feller, 1971). 

The key results for our purposes is that x/i(h) = 1 has: 

• one and only one nonzero root 

-°° < h{d) < 0 if E{g(x)|6} = E 0 {s(x)}>O 

( 2 - 12 ) 

0 < h(6) < ~ if E{g(x)|0} = E e {g(*)} < 0. 

• No non zero real root if E{g(x) \ 6} = 0. 

When we try to detect a change from a negative trend E[g(x) I 0o) < 0 to a 
positive trend E{g(x) I 6\) > 0 then, it implies that hid]) < 0 < h(6o). 

Notice that the roots are functions of two parameters: the probability 
density of the observations P(x) and the nonlinearity g(x). The approximate 
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formulas of SPRT test ST(a,b) for the OC and ARL are derived when g(xi) are 
i.i.d. and b < 0 < a, using the assumption of no excess of S n over a and b. 

Lower bounds for the ARL function of two-sided SPRT test under 
hypotheses Ho and Hi in terms of the error probabilities are given by Wald 
(Wald, 1947): 



L(e 0 ) = £o{N}> 



(l-tt)log(ft/l-a) + qlog(1-/?/tt) 



L(0 1 ) = £ 1 {N}> 



P logjp / 1 - a) + (1 - j3)log(l -/?/«) 



if 9 = 9 0 . 

(2-13) 

if 6 = e v 



Bounds for the operational characteristic function Q(9) are given by 
Ghosh (Ghosh, 1970): 

• For detecting a change of positive trend (upward change) h(9o) > 0: 
exp{ft(6>)-a}-l ^ ^ < flg)exp{ft(g)-g}-l 



exp{fr( 9) • a} - r\( 0)exp{/t( 9) • b } S( 0)exp{/?( 9) • a} - exp{/i(0) • 1?} 

• For detecting a change of negative trend (downward change) h(9o) < 0: 
exp{ft(6>)-a}-l ^ < T){9)exp{h{e)-a}-l 



exp{/?(0) • a} - <5(0)exp{/i(0) • 77(0)exp{/i(0) • a} - exp {h(9) ■ fcj 

where (2-14) 



t}{9) = inf £E exp{/z(0) • g(x)} 

!<$<oo I 



S ( e ) = n ^ f , $ E \ exp{/?(0) • $(*)} 
0<£<1 I 



exp{/i(0)-g(x)}<|;ej<l 

exp{/i(0)-^(x)}<-|;0|>l. 
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Recall that for any test, the OC should result in Q(0o) > 1 -a, and Q(0]) < P (see 
Appendix). Thus, the motivation is to find bounds for the ARL in terms of 
the OC function and the stopping thresholds a,b. These upper and lower 
bounds for the ARL are given by Ghosh (Ghosh, 1970): 



I^Magtas 



L(e)=E g [N] 



^ a 2 [l-Q(6>)l + b 2 Q(g) 

e{s 2 M|s} 

Ja + y(e)][l-Q(e)] + feQ(e) 



E{sM|0) = O (2-15) 



E{s(xp}<0 



where y(6) = su|3 £{#(*])- r|g(;t) > r > 0; 0} is the "excess over the boundary." 
The mean time between false alarms T is given by L(6q) while the delay for 
detection is given by L(6\). 

Detecting a change from a negative trend E[g(x) I 0o) < 0 to a positive 
trend E{g(x) 10]} > 0 (upward change) can result in effective bounds for L(0). 
Notice that Q(0o) ^ l - « and Q(6\) < p, result in consistent inequality directions 
in (2-15). Thus, upper bounds can be evaluated in the case of disorder 
detection. Similarly, bounds for L(6) in the case of detecting a change from a 
positive trend to a negative one (downward change) can be found by 
reversing the inequalities in (2-15). 
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4. The Operational Characteristic OC and ARL Functions for One-sided 
SPRT Tests 

For detecting a change from a negative to positive trend, the 
probability that the one-sided test does not stop under 6q, can be found by 
using the limit of the two-sided OC function, as b tends to negative infinity. 
Thus, this probability is lower bounded by: 

Pr{L(0 o ) = co}= Urn Q(0 O ) 
b— 

> Um exp{/i(6> 0 ) a}-l 

~ exp{h(6 0 ) • a} - r?(0 o )exp{/i(0 o ) • b} 

_ exp{fr(0 o )-fl}-l 
exp{/i(0 o )-a} 

Notice that the obtained lower bound avoids the use of the functions 5(6) and 
rj(9) which are difficult to generalize. 

The probability that the one-sided test terminates under $o which is 
the size (a) of the one-sided test is upper bounded by 

a = Pr{S N > «|H 0 } = Pr{L(S 0 ) < ~} 

= l-Pr{L(e 0 ) = “} (2-W) 

Sexp{-h(8 0 )“}- 

This result is very important and will be used in the sequel when analyzing 
the cumsum procedures due to Lorden's criteria. 

An upper bound for the ARL function of the one-sided test under 6\ 
(Delay for detection) is obtained by using the upper bound for the ARL of the 
two-sided test (2-15): 
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!.(<»,) = E,{N}Slim 

0 — >-oo 



[i + r(9)][i-Q(«,); 


+ ! >Q(9i) 


E{g «l«l] 





Since b -> -«*, the OC function Q(#i) is a decreasing function (monotonicity 
property of the SPRT function), E{g(;t)| #i) > 0 (detecting a positive trend), 
hence, the right-hand side is a decreasing function and the inequality is 
preserved as Q(0 3 ) — » 0, resulting in: 

L(fl 1 ) = E 1 {N} = £{N|e 1 }s^i^j E{s(*)|e,}>0. (2-17) 



5. SPRT for Composite Hypotheses 

Although the SPRT was derived from a test of a simple hypothesis 
against a simple hypothesis, it was shown that from the on-line point of view 
of detecting abrupt changes, optimal stopping rules do exist in the case of i.i.d. 
observations with known distributions before and after the change. When 
the distribution after the change is not known, some other hypotheses can be 
considered. Thus, it is natural to consider to test for example H o: 0 < 0* 
against Hi: 6 > 9*. 

Wald (Wald, 1947) considered the method of weight functions in 
order to deal with unknown composite alternatives where the alternative 
may be a parameter within a surface (Rejection Region). If the method of 
weight function is not feasible, so-called open-ended (one-sided) likelihood 
ratio test procedures can be considered. Lorden (Lorden, 1971) investigated 
that approach for the problem of open ended (one-sided) tests for the one 
parameter exponential Darmois-Koopman families of distributions. This 
approach leads to easily computed procedures to obtain approximations to the 
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detection probability and ARL functions of the SPRT of composite hypotheses 
using only the theory developed for simple hypotheses. The following shows 
that this is generally possible in the context of a one parameter exponential 
family, and will form the base for Lorden's cumsum procedure. 



cl Composite Testing for Darmois-Koopman Distribution Families 
(Siegmund, 1985) 

Consider a general SPRT test defined by (2-2) and (2-3) with the 
additional assumption that x\, xi , ... are i.i.d., so that 



sf =io S fr 



i=l 



p i(*.) 

fbfe)' 



Next we follow Siegmund's analysis (Siegmund, 1985) to derive a new 
observation. Let P, P* be third and fourth density functions, such that the 
original test of Pq against Pi is equivalent to a test of P against P* with new 
stopping boundary values, such that 



P'M [piwT 

PM L p oMj 



6\ ^ 0. 



(2-18) 



Note that P*(x) must satisfy 




Pi(x) 

POM 



- 1*1 



P(x)dx = 1. 



Define 



hence 



z(jt) = log 



PjM 

p oM 
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f e z ( x ^P(x)dx = 1. 

J—oo 



Now we define a function b{6) such that 

P e z ^ e P(x)dx = e b ( e ) * 1 (2 - 19) 

where 0 represents the statistical "distance" between the null and the 
alternative hypotheses. Notice that b(6\) = b( 0) = 0. If the last integral 
converges then 

r = 1 

J — oo 



and 



P g {x) = e z ^ 6 ~ b ^P(x) 



and 



p ei x ) z(x)8-b(8) 
P(x) 

i0 



(2-20) 



represents the new test since 



PM. 

P(x) 



The resulting test defines a 



one-parameter exponential family o 
tests can be evaluated easily. 

Differentiation of (2-19) w.r.t 6 gives: 

b '( e ) = ir« z W' p e( x ) dx = E x{ z i x )} 



h w 

p o(*). 

distributions under which composite 



and 



b ”( e ) = = var x( 2 W) ^ 0 (2-21) 



so b(9 ) is convex. The desired 6 \ * 0 satisfying (2-18) exists if and only if 
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b ’(°) = J ^ oo z(x)P{x)dx = E x {z{x)} * 0 



since bid) is convex and b(& i) = b(0) = 0 (see Figure 2.2). 
The original test of Pq against Pi 



N = inf< 



TV 



n ^1 ( x fc ) 

IVoM 



« (a,b) ► 



( 2 - 22 ) 



is equivalent to a test of 



N = inf< 



n : 





Since P(x ) represents the null hypothesis under (2-18), it is clear that 
for composite testing, b'(0) = E[z(xi)} = E[z(x)|Hoj ^ 0 implies that under the 

null hypothesis (no disorder) the test should give a negative trend b'( 0) < 0 
when detecting a change from a negative to positive trend (see Figure 2.2). 
This result is consistent with another one which is presented in the sequel. 
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namely, that it is worthwhile to bias the detector if it is known that before the 
disorder occurs, the test will have zero mean. This gives some degree of 
robustness to the test under composite hypothesis testing. 

b. Performance Evaluation 

The following proposition establishes an important result about 
the performance of the SPRT within the composite framework. This result 
will be shown to play a key role in Lorden's work about the optimality of 
Page's test in the on-line framework (minimum average delay for detection), 
by assigning a lower bound on the ARL for competitors of Page's test. 

Proposition (Wald, 1947): 

Given a two-sided sequential test of H o'-X<= do against H\:Xe 6, 
suppose N] and Nj are stopping times for x\, *2, ••• e X such that: 

Pqq (N] < °°) < cc < 1 and Pg(/V 2 < °°) < < 1 

where a and [5 are the false alarm and miss probabilities (respectively). 

Define: 



this is the information number or the Kullback-Liebler number. Then: 




(2-23) 



• 1(6, 6q)- E 0 {min(Ni,N 2 )} > (1 -f3)\na ^^2 

• and for N 2 — » +°°(one - sided test) and f3l 0, 



(2-24) 




□ 



The proof can be found in Wald's book (Wald, 1947, p. 197) 
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Remark: The last proposition gives a lower bound on the average delay for 
detection D = E{N |Hi) for any stopping rule for which a = Po(N<«0 < 1 (see 
Figure 2.3). For the SPRT we have the approximate relations (2-13) between 
the ARL and the error probabilities. The last proposition generalizes (2-17) for 
composite tests, in asserting that the ARL function in (2-13) is approximately 
minimal. 




Figure 2.3. Sequential Test Exit Times 

c. Performance of Sequential Composite Tests in the Presence of a 
Weak Signal 

In classical detection theory, the locally optimum detector 
maximizes the slope of the power curve with respect to signal strength 
(evaluated at zero signal strength or at the presence of a weak signal) for a 
fixed false alarm (Neyman-Pearson locally optimum procedure). In this 
composite alternative hypotheses approach, the alternatives 6 are close in the 
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sense of metric or distance to the null hypothesis #o • It can be shown 
(Kassam, 1987) (Poor, 1988) (Kazakos, 1977) that the locally optimum detector 
in the classical detection problem of Ho: x, ~ P(x| $o) versus Hi: X; ~ P](x) = 
P(x | 6) for 6 > (fo is given by: 



8lo(x) = - 






Pq'W 

PoW 



■ -^ ln { p w«)} k* 



(2 - 25) 



where 6- Oq indicates the "distance" between H] and Ho, and 0—>0o indicates 
a weak signal situation, resulting in the locally most powerful (LMP) 
nonlinearity g/ 0 . For the ST defined by (2-2) we can define the Signal-to-Noise 
Ratio (SNR) (Kassam, 1988): 



SNR 4- 



(E{S„|P }) 2 

var{S„|e 0 } 



We seek to maximize the SNR when 0 —> 6q. 

The efficacy £ of a test is defined (Kassam, 1988) as the limiting 
incremental signal to noise ratio: 



£(g)= lim 

n— 

0->0O 




var{s(x)|0 o } ' 
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The nonlinearity g(x ) that maximizes the efficacy is the local optimum 
nonlinearity g{ 0 (x) = -Pq'(x)/ Pq(x ) which is also the Neyman-Pearson locally 
optimum procedure. In this case the efficacy is equal to Fisher's information 
for a location shift test, namely: H o: x\ ~ f(x) versus Hy Xj ~ f(x-6), and is given 
by (Kassam, 1987): 



6. Practical Criticism of the SPRT and Truncated Tests 

The optimality property of the SPRT is a remarkably strong property 
but it applies only to simple hypotheses. Even for the simple case of constant 
signal detectors as shown in (Poor, 1988), it is necessary to know the signal 
value in order to implement the test. This is in contrast to the Fixed Sample 
Size tests which are UMP for 6 > 0. For applications involving composite 
testing, the open continuation region can lead to very large sample sizes, 
especially when E{log[/i(x)//o(;r)]} = 0. Thus, although the ARL of the SPRT is 
finite with probability 1, it is not bounded. This difficulty can be overcome by 
modifying the SPRT to stop sampling and make a hard (single-threshold) 
decision after the ARL has reached some maximum number of samples. This 
type of test is known as the truncated test and can be described as follows: The 
sequential test is defined by 



In the absence of a definite upper bound on N, we define an upper bound M. 
Hence, the new (truncated) stopping rule is given by 




N = inf{«: S n ( a,b )}. 



min(N,M). 
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Another problem associated with the SPRT is the estimation 
following detection. If we want to stop sampling as soon as it is possible to 
tell in which of two subsets of the parameter space a parameter lies, then 
usually, the estimation procedure will require an adequate number of 
samples which is larger than the sample size. A possible solution is to 
artificially enforce a larger sample size. However, sequentially stopped 
versions of the estimators are biased, while we would like to consider 
unbiased estimators. The problem of estimation following sequential tests is 
not a part of this work. However, in the disorder detection framework, we 
are interested in randomly stopped averages where m is a random 

variable. The Anscombe-Doeblin theorem (Siegmund, 1985) shows that such 
averages are asymptotically normal under quite general conditions. 



E CUMSUM PROCEDURES 



Assuming that a given process has i.i.d. observations x\,xi, ..., whose 
distribution possibly changes from Po to P\ at an unknown point in time v, 
then, in the hypothesis testing framework the problem can be presented as: 
H 0 : x v x 2 ,... ~Pq 



versus 



H v : X] / x 2 / ...,x v _] ~P 0 



v>l 



x V' x v + !'••• 



~P, 



(2-26) 



Let P v and E v denote the probability measure and the expectation under P v 
respectively, when the change from Pq to Pj occurs at the v th sample 
( v ,= 1 , 2, ...). Let P o denote the probability that there is no change, i.e., v' = «=. 
Using Lorden's definition (Lorden, 1971): 
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D = E]{N} = su^ess sup E^N - v + l) + |xi,X 2 , , | 

where: (a) + = max(0,a) (2- 27a) 

which is the worst possible or the least favorable conditional, expected 
detection delay or quickness of reaction to a change (disorder). Thus, a 
"minimax" type of criterion is defined for which the delay D is the smallest 
value such that for every v >1 

E v j(N- v+1) + |a: 1 ,X2,...,a: v ,_ 1 J < E^N} 

almost surely under Fp. 

The goal is to find the stopping time N which allows the quickest 
detection of the change, subject to: 

E 0 {N) > y (2-2 7b) 

The constraint implies that if the change does not occur, then the expected 
time for false alarm is no less than the threshold of y (where y — » °° 
asymptotically). 

Several ad hoc proposals to solve this multiple hypothesis problem that 
at least one of the H v hold (1 < v' < n ) against Ho- The most well known 
procedures are the Page-Hinkley and Shiryayev-Roberts tests, and will be 
presented in the sequel. Both are based on the probability ratio, hence, 
presuming the properties presented in the last section. 

1. The Page Cumsum Test (Page, 1954) 

Page's procedure has two equivalent implementations: Recursive 

test which can be considered as a repeated modified one-sided SPRT test with 
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constant stopping limits, and a Repeated SPRT with a moving indifference 



zone. 



a. Repeated SPRT with Moving Indifference Zone 

Consider the test: 




where 




(2-28) 



S o = 0 



The indifference interval equals to (0 ,a). The stopping rule based upon (2-28) 
is defined as 

N* = infin: S„-minSi->fli (2-29) 

l 0<k<n J 

Note that g n = S n - min measures the current height of the random walk 

0<k<n 

S k/ k = 0, 1, 2, ... above its minimum ,r alue. Whenever the random walk 
establishes a new minimum, the process forgets its past and starts again in the 
sense of a renewal process (see Figure 2.4): 



This renewal property has important consequences. It means that 
N* can be defined in terms of a sequence of one-sided SPRTs as follows: 




(2-30) 
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N\ = infjw: S„e(0,fl)} 

if S N > a, then: 

otherwise: 



N* = N V 

S N = min Su and define: 

1 0 <k<Ni 



N 2 - inf|«: S Ni+n ~S Ni € (0,a)j 



(2-31) 



if S Ni+Nz -S Ni >fl, then: 
otherwise: 



N* = N\ + N 2 
mi 

0<k<N- l +'N 1 



^n,+n 2 - ^^ij) ^ 



and in general. 
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Figure 2.4. Page Test (Moving Indifference Zone) 
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The overall stopping time is given by: 



N* = N] + N 2 + ••• + N M 



(2-32) 



and is called the extended stopping time, since it consists of sum of single 
SPRT stopping times, where: 



is the number of the repetitions (renewals). By (2-33), M is geometrically 
distributed (see also (Siegmund, 1985)) with: 



which expresses the extended stopping time in terms of the expected stopping 
size and error probability of a single SPRT. 

b. Recursive Implementation 

As will be shown, the recursive implementation has two 
interpretations. 

The first is the relation to the repeated one-sided Wald sequential 
test with boundaries 0 and a, which forms a renewal process whenever the 
random walk S n hits the lower boundary 0 (see Figure 2.5). The renewal 
process is repeated until such time that a Wald test exceeds the threshold a. 
Thus, the process S n can be described as 




(2-33) 



£{M} = 1 / Pr{s Nl > a}. 



M 

Define: N* = ^N,*, and using Wald’s identity (2-7) we obtain 
/=! 



E{N’} = E{Nj}E{M}- 




(2-34) 



(3-35) 



S„ = max{o,S n _, + #(*„)} 

S 0 -s 

while the extended stopping time is given by: 



N* = inf|n:S„ > a} 



This representation is equivalent to the original Wald test 
stopping rule: 



N = inf{n:S n < 0 or S n > a) 



which is repeated from the initial score So each time S n < 0 (zero being the 
renewal boundary, hence, the name: repeated one-sided Wald Test), and so 
on, until such time that a Wald test exceeds the threshold a. 




Figure 2.5. Recursive Implementation of Page's Test 



The second interpretation of the recursive algorithm is related to 
the connection of the one-sided first passage problem with a single server 
queue. It will be shown that a queueing process w n can be described in terms 
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of a random walk S n . This fact forms the basis of the asymptotic distribution 
analysis of w n as 

Assume the customers arrive at single server at time a\ < a\ + az 
< a] + a 2 + a 3- These arrival times are the arrival epochs which form a 
renewal process. Assume, a\, az , ... are i.i.d. and let b n ( n = 1, 2, ...) denote the 
service time and w n the waiting time of the n^ customer. Suppose that the 
(n- \) th customer arrives at epoch t. His service time starts at epoch t+ w n - 1 
and terminates at t + w n - 1 +fr „_ \ (See Figure 2.6). The next customer arrives at 
time t+a n . He finds the server free if w n -i +b n -i < a n , but has a waiting time 
(server busy) w n = w n - \ +b n - 1 -a n if this quantity is greater or equal to 0. 
Denote the queueing process by x n = b n -i -a n . In short: 

’wn-l+xn i(wn-l+Xn ^0 



w n 




if w n -\+Xn < 0 



or: w n = max {0, ivn-l+Xn) 



m = o 

This result shows that if the service times b], bz, ■■■ are i.i.d., then the x n 's are 
also i.i.d., hence the process iv n is a random walk which resets to 0 whenever 
it enters (-<>=,0). In order to describe the random walk w n in terms of the 
random walk generated by the random variables x n , define: 

S n = *1 + XZ + ... + x„ 

and adhere to the notation for ladder variables. Define v as the subscript for 
which Si > 0, Sz ^ 0, ..., S v-2 ^ 0, but S v < 0. By definition, v is the first 
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descending ladder epoch denoted by . Up to this epoch, the customers 
1, 2, v— 1 had positive waiting times w\ = si, 102 = S 2 , w v - 1 = s v _i. The v 
customer is the first one to find the server free. The first conclusion is (Feller, 
1971): The descending ladder epochs correspond to the customers who find 
the server free, (i.e. w k = 0) and constitute a renewal process with recurrence 
times distributed as (Since the continuation of the random walk beyond 
epoch is a probabilistic replica of the entire random walk). 

Suppose now that customer v-1 arrived at epoch r. His waiting 
time was w v -i = S v -i, the epoch of his departure is x + w v -\ + b v - \ (see Figure 
2.6). The customer v arrived at epoch r + a v , when the server is free. Thus, 
the time for which the server is free is given by 

free time = r + a v - (r + w v -\ + b v - 1 ) = a v -w v -\ - b v - \ 

— — X V' W y_l ~~ ~X V — 5 V'— 1 = — S y. 

But by definition S v is the first descending ladder height My 
Thus, as the second conclusion we have: The duration of free periods are 

i.i.d. random variables which constitute a renewal process with recurrence 
time distributed as 

To summarize, customer number k which arrives at epoch 
+ ... + T k ., is the customer that finds the server free. At the epoch of his 
arrival the server has been free for -0-( k time units, at the same time the k th 
descending ladder height is given by 9{~ . 

The remarkable statistical property of the random walk as of 
containing two imbedded renewal process: the ladder epochs and ladder 

heights, and the fact that the random walk is a probabilistic replica of the 
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entire random walk after the first ladder epoch (and each other ladder epoch) 
enables important results to be found about the distribution of the ladder 
variables in terms of the first ladder variables. It is easy to follow the next 
analysis of Page's cumsum tests (2-29) and (2-35) in terms of ladder variables. 

c. Page Procedure Revisited 

The first Page’s version presented by (2-28) and (2-29), implies that 

the time k for which min S*. gets its minima, is a descending ladder epoch. 

0<k<n 

Hence, at that time k the test is renewed. The descending ladder epochs 
indicate the time where the change is more likely to happen. A change is 
declared when the test is terminated, i.e., the "distance" from the last 
descending epoch is at least a. For the repeated one-sided Wald’s SPRT 
version (2-35), the descending ladder epochs are defined at the times where 
the random walk hits the lower boundary 0 (see Figure 2.7). At that ladder 
epochs the test is renewed. Once again, the test measures the "distance" 
between the current value of the random walk from the last ladder epoch. 
This distance is equivalent to the "statistical distance" between Pq and Pi as 
defined by (2-26) or the disorder distance. Notice that this analysis was done 
for detecting upward changes. Similarly, for detecting downward changes we 
will use ascending ladder epochs and the test terminates when the test 
reaches a "distance" a below the last ascending ladder variable. 

Three important observations can be made: The first, as pointed 
out before in the analysis of Page’s test in a composite hypothesis testing, is 
that it is worthwhile to bias the detector if it is known that before the disorder 
occurs, the test will have zero mean. It can be shown from Figure 2.8 that a 
random walk with negative drift will improve the chance of rapid disorder 
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detection under the restriction of low false alarm rate, since the number of 
ladder epochs will be larger, resulting in more renewals, thus having the 
effect of "forgetting" the irrelevant past observations. This result is supported 
analytically in the sequel when it is shown that the expected delay for 
detection is reduced by biasing the test. 




Figure 2.7. Random Walks w„ and S n Containing 
two Renewal Processes: J if ^ 
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30 




Figure 2.8. The Relationship between the Recursive Implementation W n and 
the Random Walk Process S„ (From Feller, 1971) 
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The second observation is related to the first by realizing that each 
time the test is renewed (having the effect of resetting the test, hence 
"forgetting" the past), the test or detector behaves like an adaptive detector, 
since when reaching the descending ladder epoch, the past noisy observations 
containing no data about a possible change can be ignored. The fact that at 
each ladder point the likelihood of a change is the greatest implies that since 
the disorder is a local phenomena, the detection will occur if the signal-to- 
noise ratio and/or disorder duration is large enough, resulting in a threshold 
passage. Thus, this adaptive detector acts like a low-pass filter which filters 
the incoming signal except the changes. 

The third observation leads to the analytical equivalency between 
the Page tests (2-27,2-29) and (2-35) and is found in Siegmund (Siegmund, 
1985). 



Let S n = x-[+...+x n . By backward recursion, 

w n = max(0, w n _- , + x n ) = max^O, ( w n _ 2 + *„_i ) + + x n ) 
— max(0, iv n ~2 + 1 + x n , x n ) 

= max|o, (w n _ 3 +x n - 2 ) + +x n . l +x n/ x„) 



= max(0, uv 3 +x n _ 2 x„) 
=...max(0, S n , S n -Si, S„-S 2 ,..., S n -S n _!) 

= „ m , ax ( S n ~ S k)= S n ~ m , ax Sk 
0 <k<n 0<k<n 



(2-36) 



which shows the equivalent interpretation of the two procedures. Namely, 
the queueing variable w n measures the departure of the process S n from its 
last maxima. 
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This shows also that the distribution of w n and its asymptotic 
behavior can be studied. 

Theorem (Feller, 1971): The distribution of the queueing variable w n is 
identical to the distribution of the random variable M„, where: 





M n = max[0,S,,...,S B ] 

OZkZn 




where: 


H 

■HI 

II 


(2-37) 






□ 


Hence, 


Pr{te n > a} = Pr{N* ( a ) 




where 


N* (a) = inf{«: S n > a} 


(2-38) 


and 


lint Pr{w n > a} = Pr{ N* (a) < <»} 

°° 




This relation is used 


in the sequel to derive an expression for the probability 



of the ARL function of the test (see 2-52). 

The Wiener-Hopf integral equation (Feller, 1971) can be used to 
find an explicit solution to the probability distribution rn(x) = Pr{M„ <x) = 
Pr{tt’„ <x). 

d. The Page Test as a Sequential Maximum Likelihood Detector 

Let the problem be specified as in (2-26). When Page’s test is 
implemented with the log-likelihood ratio (repeated SPRT Test), it is 
equivalent to a sequential implementation of the maximum likelihood 
detector. The log-likelihood function l(x ], ..., x n ) is given by: 



64 



K x i 



>*n)= Ilog[P 0 (^)] + Z lo g[ P l( X <)] 
i=l i=v 



= Z lo g 

1 = V 



Ml 

p oi x i ) 



+ X lo g[ p o( x >')] 



i = 1 



Notice that the last term does not depend on the disorder time v 
and can be neglected. 



Define: 



s" v = 2>g 

i=v 



p o(*i) 



and replacing the unknown jump time v by its maximum likelihood 
estimate under H v , we get the following change detector: 

8n{v) = max S” = S” - min sf 

1 <k<n 

= S n - min (2 - 39) 

1<k<n 

resulting in the same detection test as in (2-28,2-36). 



2. Optimal Properties of the Page Test 

In this section we will review two important results due to Lorden 
(Lorden, 1971). The first result (the following theorem) will enable the use of 
Wald approximations (2-16,2-17) to find an efficient way of calculating the 
performance measure for Page's test. The second result establishes the 
asymptotic optimality of Page's test in the sense of Lorden's criterion. 



a. Bounds on the Performance of Quickest Detection for Repeated 
One-sided Tests 

Let N be the stopping variable of a one-sided Wald test: 
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N = inf{«: S n > a) 



for some statistics {S n } defined as functions of the i.i.d. observations x\, xi, •• 
Let N* be the stopping variable of the same test applied to xk, xk+ 1 , •••, for k = 1, 
2, ..., and define 



N* = minfN k + Jt-l) 

*21 1 1 

N* is the extended stopping variable of the one-sided test which stops when 
one of the sequence of tests [Nk) applied to xk, xk+ 1, stops the first time. 

Theorem (Lorden, 1971): Let N be a one-sided stopping variable with respect 
to x i, X 2 , such that 

Pr{N < oo| H 0 } < a. 

Let Nk denote the one-sided stopping variable obtained by applying N to x\ t, 
Xk+ 1 , and define 



N *= min{N fc + It - 1}. 

Then, 

E 0 {N'}>-=y (2-40) 

a 

and for any alternative distribution Pj, 



£,{N*}<£,{N) □ 

Notice that the one-sided Wald test is applied to Xk,Xk+i, ■■■, stopping the first 
time one of these tests stops. This result can now be viewed by using the 
renewal argument: Each time the test statistic S k (2-35) falls below zero, S k is 
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reset to zero indicating that a new test is starting from k + 1 and so on until the 
first test reaches the stopping boundary. 

This theorem establishes the optimality of Page's test ( N *) versus 
unrepeated one-sided tests ( N ). This result will be used in the next section to 
derive a performance measure for Page's test. 

b. Asymptotic Optimality of Page's Test 

Recall Lorden's criterion definition for the performance of 
cumsum procedures. The stopping rules N must satisfy 

E{N\ v = oo) = E 0 {N) > y. 

The quickness for which the stopping rule detects a true change in 
distribution is evaluated by E]{N} given by (2-27a). 

The problem of minimizing E^N} subject to the constraint 
Eo(N) > y becomes more interesting if we replace the distribution P\ by the 
Darmois-Koopman family of distributions {Pq, 6e 0} with 6 unknown, and try 
to achieve small E 0 {N} (defined as E^N}) for each 6 subject to Eo(N) > y. To 
handle composite {P#}, one-sided sequential tests of Pq vs. { Pq } are applied to 
Xkf xk+ h •••/ for k = 1, 2, ..., stopping the first time one of these tests stops. 

Lorden showed that we can simultaneously minimize E e {N} for 
each 6 asymptotically as y-» °° for a wide class of tests. Lorden's main result 
was that Page's test (N*) implemented with the log-likelihood function and a 
zero score with a stopping boundary /belongs to this class. The following 
result will show that Page test achieves the lowest possible E]{N*}, resulting 
as an optimal test both when P\ is known, and when Pi is unknown 
(composite testing case). 
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Let N and N* be defined as in the last section. If N is the stopping 



variable of a one-sided SPRT of P o vs. Pi with likelihood-ratio boundary 1 /a, 
then by using Wald's approximation (2-7) we have that 



where I(6\, 6q) is the information number as defined by (2-23). Applying the 
last theorem, we obtain that N* (Page's procedure) satisfying Eo{N*} > cr 1 and 
E]{N} is asymptotically at most |loga|//(P 1 ,P 0 ) as a — » 0, and this is 

asymptotically the best we can do. In other words: 



Lorden also showed that we can simultaneously minimize E e {N} for each 6 
asymptotically as y-» ~ for a wide class of tests: 



Moustakides (Moustakides, 1986), extended these results to the non- 
asymptotic case where yis finite. 

F. PERFORMANCE ANALYSIS OF THE PAGE TEST 

In 1954, Page (Page, 1954) introduced a control chart procedure based on 
the repeated one-sided Wald-SPRT test with boundaries (O^i), zero being the 
renewal boundary and a the stopping boundary. 

Let the problem formulation be according to (2-26), that is, 



and let L(s,6) be the ARL of this test with initial score S 0 = s when {.v,} are i.i.d. 
F(.v | 6) distributed, i.e.. 



E,{N} - |loga|/ /( 0 ,, 0 O ) as a 0 




as y = a 1 — » «*>. (2-41) 




as y -> °o. 
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L(s,8) = e| N* |s 0 = s,0j. 



Consider now Wald's test 

N = inf {n: S„ < 0 or S„> a}. 

Similarly, let L w (s,8 ) denote the ARL of Wald's test, 

L w (s,6) = E{N |S O =S,0} 

and let Q w (s,6) be the Operating Characteristic (OC) of the same Wald test, that 
is 



Lorden (Lorden, 1971) and Benveniste (Benveniste, Ed., 1986) showed that 
the least favorable delay for detection, D, as defined by Lorden for Page's test 
occurs when the test statistic is zero when the change or the disorder occurs, 
i.e., Su-i =0, since the test statistic has the longest path to travel towards the 
stopping boundary. Thus, the ARL function with initial score S 0 = 0, 
determines both the false alarm rate T and the delay for detection D as given 



Qu>(S/$) ~ P(Sn < 0 1 So — s,6). 



Then, the ARL of Page test L(s,0 ) is given (Page, 1954) 



Us, = i Q n S !ni) ^ 0 ' ^ + 



(2-42) 




(2-43) 



(2-44) 



Possible ideal and real ARL functions are presented in Figure 2.9. 
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(a) Real and Ideal ARL Functions for Testing 6 < 6q against 9 > 6q 




(b) Real and Ideal ARL Functions for Testing 0 = 0o against 9 * 6 q 
Figure 2.9. Possible Ideal and Real ARL Functions 

Remark: The local properties of cumsum tests can be measured in terms of 
the derivative of the function L(6) at do, since the local properties of a test 
measures the test's performance as 6\— >8q (when the statistical "distance" 
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between the two hypotheses tends to zero), meaning weak signals or a low 
signal to noise ratio case. 

It was shown (Nikiforov, 1986) that convenient measures can be defined 
by using the derivative £ of the ARL function for determining a local 
approximation (see Figure 2.8), where £ is defined as 

6 30 

and if £ = 0, the local approximation is given by (2 - 45) 

3 2 L(0) 

de 2 

This forms the basis for what is called in the sequel the local approach for 
cumsum tests, resulting in local sequential tests. 

1. The Lorden Approximations 

As shown in the last section and given by (2-40), Lorden established 
bounds on the delay for detection and the false alarm rate for Page's cumsum 
test in terms of the Wald sequential test. 

Lorden's theorem (2-40) can be applied to Page's test with nonlinearity 
g(x) and a zero score to obtain a new bound. Using Wald’s lower bound (2-16) 
as derived for his one-sided sequential test we obtain 

a < exp(-/t(0o)} 

Thus, from (2-40), the mean time betw r een false alarm can be lower bounded 
by 

T = E 0 {N 1 > exp {h(do)-a), (2-46) 



if $ = 0. 

0 = 0 O 



if £ *0 

0 = 0 O 
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where h(6o) is the unique non-zero root of the moment generating function 
(2-11). Using (2-17) and applying Lorden's theorem, the delay for detection 
can be upper bounded by: 




(2-47) 



Equations (2-46) and (2-47) are known as Lorden’s bounds. 

Remark: Notice that the mean time between false alarms (2-46) is an 

exponential function of the stopping bound a. 

2. Wald Approximations 

Similar results can be obtained by using the approach proposed by 
Nikiforov (Nikiforov, 1986). Recall the approximation (2-14) and (2-15) 
obtained for the OC and ARL functions for the two-sided Wald sequential 
test. These approximations can be used with the modification btO (zero 
renewal boundary for Page's test). Once again, a lower bound for T and an 
upper bound for D will be derived. 

a. Lower Bound for T when E[g(x) | 0q) < 0: 



Notice that the right hand side is a decreasing function of Q(#o) (since 
d/ <fQ(Q/(l-Q)) > 0 and b < 0). Hence, using the upper bound for Q obtained 
for /i(Ao) > 0 given by (2-14), we obtain: 



Using Page's result (2-43) and the bound (2-15) yields: 



T - L(0,6 q ) = lim 



lim _UMoL 

*to i-Q(o,e 0 ) 




72 



T > lim 



bTo E {s(*K} 






/(^o) + ^‘ 



^(0 o )exp{^(g o )fl}-l 

l-exp{ji(0 o )fc} 



The last term is a function of b, but both the numerator and denominator 
approaches zero when fctO. Using L'Hopital's rule and using the fact that 
5(6q) > 1 we obtain (Broder, 1990): 



T> 









(2-48) 



where h(d 0 ) and j{6o) were defined in (2-11) and (2-15) respectively. 



b. Lower Bound for T when E(g(x) I 0o) = 0 
In a similar way we obtain (Broder, 1990): 

»tol-Q(o,8 0 ) 

> lim 1 ^[i-QWl+^QW 

«ol-Q(0 o ) E{j 2 (*K} 

„2 



E{s 2 Mi«o}' 



(2-49) 



Remark: By (2-49) the mean time between false alarms is a quadratic function 
of the stopping bound a. Recall that by (2-46) it was shown that when 
E{g(x) I $ 0 ) < 0, the mean time between false alarms is an exponential function 
of the stopping bound. Hence, once again it is demonstrated that it is 
worthwhile to bias the test to have a negative drift before the change. 
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resulting T as an exponential function of the stopping bound instead of a 
quadratic one (Broder, 1990). 



c. Upper Bound for D 

Consider now the delay for detection. Since that for detecting 
upward change, after the disorder £{#(*) | #i) > 0 results in h(6\) < 0. Using (2- 
44), (2-14) and (2-15) in the same manner as before we obtain: 

D = L(O,0j) 

.. W 
i-Q(o,e,) 

, Um i f<» + r(^ l )I 1 -0(g,)1+i’Q(g,) 

»toi-Q(e,) E{s(i)|e,} 



Once again, since the right hand side is a decreasing function of Q((?i), and 
since b < 0 the inequality is preserved. Thus, as Q(0i)iO, we can replace Q( 6 1 ) 
with zero and obtain 



D < 



a + yUh) 

E[g(x} I 0i) 



(2-50) 



which is consistent with Lorden's bound (2-47). 

3. Asymptotic Performance and Measures 

For all disorder detection schemes the pair (T,D) determines the 
detector performance, just as Pd versus PpA ( Pd being the probability of 
detection and Pfa is the probability of false alarm) determines the 
performance of a classical detector. 



74 



As shown, the ARL function determines uniquely the pair (T,D). 
Thus, it is in our interest to examine its asymptotic performance. Hinkley 
(Hinkley, 1972) used an asymptotic performance measure for the nonlinearity 
g(x) while using Pq and Pfa as performance criteria. This measure was 
derived while calculating the efficiency of the cumsum procedure. The 
proposed measure was 



Recently (Broder, 1990), another performance measure was proposed 
resulting in an alternative technique which allows recursive computation of 
the ARL, the stopping bound a increases, avoiding the complicated 
numerical integration needed to generate the performance curves (solution 
of Fredholm type integral equations). 

a. Asymptotic Approximation of the ARL Function 

Central limit theorem for renewal processes. (Ross, 1989): For large t, N(t) 

being a renewal process is approximately normally distributed with mean ~ 

H' 

and variance to 2 / p?, where p (p * 0) and o are respectively the mean and the 
variance of the interarrival distribution. 



log£{exp{-MAo)-g(*)} I 0i). 




(2-51) 



where <P(x) is the Gaussian cumulative distribution function: 




— oo 
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Khan (Khan, 1981) used this result to show the asymptotic 
normality (in the sense of a — »®°) of the run length of Page's test with a 
stopping boundary a under P(&\): 

define: fi = E{j(jr)|0, } <r = var[ s (i)|6>,] 

m- a / u 



then: 



^acr / /r 



->N( 0,1) 



where N(0,1) is a Gaussian distribution with zero mean and a unit variance. 
Using this asymptotic distribution and the results derived in (2-38), a new 
approximation can be established for the asymptotic probability that the 
average delay is less than a given threshold: 



Pr{L(0 1 )<x}- 0 



x j-a_/ji 



(2-52) 



b. Alternative Asymptotic Performance Evaluation 

An alternative way to evaluate the Page test under different 
nonlinearities for various noise distributions has been shown by Broder 
(Broder, 1990). Define an asymptotic performance measure 

log ARL(6? 0 ) 



lim 

ARL 0 - 



ARL(0j) 



lim MI) 

T->°° D 



log (T) 



lim 

a — >oo D 



(2-53) 
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Notice that for 77 to reach a finite bound, both T and D approach infinity as 
a -»■ oo r thus 77 reflects the asymptotic performance and is the reciprocal of the 
slope of the (D, logT) performance curve. This performance measure can be 
interpreted in two ways: First, as the ratio of the run-length for the two 

hypotheses. Hence, the larger 77 indicates better asymptotic performance; 
second, as an Asymptotic Relative Efficiency (ARE) between the two tests. 
Recall that for a fixed mean time between false alarms (large enough): 

logfi) 

ARE, 2 = — = I™ 

' ik r,— log(T 2 ) D, L, 6 ,) 

Ti — ♦ oo 

D 2 

hence, resulting in the delay ratio of the two tests. 

Using Lorden's approximations (2-46) and (2-47) for the Page test, 
and ignoring the "excess over the boundaries" we get: 

logT > h(6 0 )a 
D < — T 

hence 



(2-54) 



7i>/i(0 o )e{sM|<M = 5. < 2 - 55 ) 

This lower bound 77 can be defined as the asymptotic performance measure, 
thus, being a convenient way to "measure" Page's test using different 
nonlinearities g(x) for various noise distributions. 
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Notice that the lower bound 77 can be used as an alternative way 
to estimate the expected delay given the desired large mean time between 
false alarms. 

Property: If g is the log-likelihood ratio, g(x) = log[(/(x I 6 \)/i(x\ 0o)L then 
under any noise distribution the bound is tight, i.e., 77 = 77 = 1(6\,6q) where 
1(6 1 , 6 b) is the Kullback-Liebler information number (2-23). 

Proof: Since 



ex Pl log 



/W 0 i) 



f(A 0 o) 



\o 0 



J -/*W n 1 ; 



using the moment generating function identity ( 2 - 11 ) it becomes obvious that 

h(0o)* 1 . 



Hence, 
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Recall that by definition: T = E 0 {N*} > a 1 

where a is the Probability of false alarm. Hence 

log or 1 

q ~ 1(6 1, $0) oc —> 0 

log or 1 

or D ‘m^) a ^°- (2 ' 56> 



Hence, (2-55) can be seen as a generalization of Lorden's result (2-41) for any 
nonlinearity function. The root h(6o) of the moment generating function 
identity "scales" (2-41), thus (2-55) establishes a general bound which can be 
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evaluated for any nonlinearity gOc) under any noise environment. We will 
be interested in the cases where the strict equality exists ( 77 = 77 ), which enables 
us to get a precise relationship between the delay and the false alarm rate. In 
the next chapter we will see some examples for which 77 = 77 . 

G. SUMMARY 

In this chapter we show that detecting a disorder presented in the 
multiple hypothesis framework ( 1 - 2 ) can be done by using cumsum type 
procedures. One of these procedures, called the Page test, was presented and 
investigated in depth. Using renewal theory and ladder variables we present 
a new technique to observe the properties of Page's test. Three observations 
are shown: first, it is worthwhile to bias the test if it is known that before the 
disorder the mean of the statistic is zero; second. Page's test behaves like an 
adaptive detector in the sense that the ladder epochs form a local minima (or 
maxima) process in which the past observations which do not contribute 
relevant information about the change are forgotten. Finally, we showed the 
equivalent representation of Page's test in the off-line and on-line versions. 

Page's test implemented with the log-likelihood nonlinearity is shown to 
be the MLE of the change time (within the multiple hypotheses testing (1-2) 
framework). Using Lorden's results, the asymptotic optimality of Page's test is 
obtained in the sense that Page's test implemented with the log-likelihood 
nonlinearity is the optimal stopping rule, that is, the average delay for 
detection subject to the false alarm rate which tends to zero is minimized. 
Thus, the log-likelihood nonlinearity is shown to be the optimal 



79 



nonlinearity, and therefore. Page's test, which is the MLE for this case, is 
shown to be the quickest detector for the disorder problem. 

Finally, performance evaluation of Page's test was derived. The main 
results are what is called the Lorden approximations for the mean time 
between false alarms T (2-46) and delay D(2-47) and similarly, the Wald 
approximations for T (2-48) and D(2-50). 

In addition, using Broder's results, the asymptotic performance measure 
is shown to be lower bounded. The problem of how informative the bound is 
for different nonlinearities will be analyzed in the next chapter. Here we 
show that for the optimal nonlinearity the log-likelihood, the bound is tight, 
i.e., the bound provides all the information needed to predict Page test 
performance. Finally, a new simple generalization of Lorden's result was 
shown for any nonlinearity function in any noise environment. 
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III. THE APPLICATION OF PAGE'S TEST TO PARAMETRIC AND 
NONPARAMETRIC CHANGE DETECTION 



A. INTRODUCTION 

In the last chapter it is shown that implementing the Page's test with the 
log-likelihood ratio nonlinearity results in the Maximum Likelihood 
Estimator (MLE) of the change time. Furthermore, it is the quickest detector 
of the disorder. The problem becomes much more complicated when the 
model parameters after the change are not known. In this case, the unknown 
random variables, v the change time, and the model parameters 6, have to be 
estimated. Thus the detection problem can be presented in the estimation 
framework. Joint estimation of v and 6 is a very difficult task because the 
disorder occurs at an unknown time and the presence of several unknown 
parameters forces the use of suboptimal detectors. Hereby, we present some 
competitive ad-hoc methods used for detection and if possible also estimation 
of the change time and the model's parameters. 

1. Likelihood Oriented Methods 

In situations such as detection of an unknown change magnitude of 
Gaussian signals it is possible to perform the joint estimation of v and the 
unknown parameters 6 (Basseville, 1988). In such cases, the detection 
approach consists of replacing the unknown jump magnitude of the model 
parameter by its MLE. The Generalized Likelihood Ratio (GLR) test of the 
joint estimation becomes 
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Hi 

max maxS”(v, 0i) > A 

l<v<n ft r rf 

H 0 

where S”(v,#i) 1S log-likelihood cumsum statistic. This double 
maximization problem of estimating both the change time and the 
parameters is reduced to a single maximization of the cumulative sum since 
the Gaussian characteristic of the signal to be detected allows an explicit 
solution as a function of d\ for the likelihood ratio test. (Basseville and 
Benveniste, Eds., 1986, Chapter 1). Hence the change time estimate becomes 

v(r) = argminS"(v,0 1 ). 

V ' ' 

This property is still valid in a more general situation when we consider the 
problem of detecting additive changes in linear models described in state- 
space representation and leads to an efficient change detection algorithm with 
reasonable computing cost. An earlier approach consists of monitoring the 
innovations of a Kalman filter, because of the linear property of the system 
and additive effect of the change on the system, it may be shown (Willsky and 
Jones, 1976) that the effect on the innovation is also additive. Moreover, the 
Gaussian characteristic of the state and observation noises which ensures for 
an explicit solution in Q\ for a likelihood ratio test, is still valid in this 
situation. These points were explored by Willsky and Jones (1976) who 
derived a recursive algorithm for the GLR test computed for the innovation 
of a Kalman filter designed under the no change hypothesis. The distribution 
of these innovations with respect to its past values, thus, the cumulative sum 
to be computed in this case is in the form of 
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#W P «,)=X'° 8 



k=i 



L (**!**- i/---^o) 

=i ^ P 6o( x k\ x k-\'---' x o) 



where Pq j reflects the change in a certain parameter (change in the mean, 
variance, etc.). The GLR test is then 



As mentioned above, the maximization over 6\ is explicit because of the 
Gaussian assumptions of white noise and additive changes, hence the test for 
the change time is reduced to a single authorization even in this more 
general situation. 

In the case of detecting changes in model eigenstructure such as 
changes in AR or ARMA models or equivalently in the state transition 
matrix of a state-space representation of a model, the problem of the joint 
estimation of the change time v and the changing parameters is more 
complicated. At this point we need to distinguish between two types of 
situations: in the first case, if the signal or system is known to have the same 
behavior as an AR or ARMA process, then the model is descriptive enough 
for its parameters behavior to be detected (Basseville, 1988). The second case 
reflects a situation where the signal or system is not known and the main 
issue is to detect changes in the eigenstructure, then the AR or ARMA 
models are nothing but a tool for detection of such changes. 
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2. Simplification of GLR Tests (Two Models Approach) 

In the case of segmentation of signals resulting from AR models such 
as speech segmentation (Andre-Obrecht, 1988) the detection of abrupt changes 
in the AR parameters is performed via the comparison between a long-term 
model Mo identified in a growing window and a short-term model Mi 
identified in a sliding window of fixed length. (See Figure 3.1) This method 
is shown to be a simplification of the GLR test since for implementing the 
GLR, the maximization over 8 (the AR vector parameter) is no longer explicit 
because the change is not additive on the observation. Moreover in the case 
of ARMA models, the cumsum is no longer linear in the parameter, 
therefore the test becomes quite expensive since for each possible change time 
r we need to use the data {r, r+1, ..., n) for identifying the AR model Mi after 
the change and compute the log-likelihood ratio cumsum s", then maximize 
over r. In the case of AR models, this method is not only expensive but leads 
also to boundary problems (Deshayes and Picard, 1986). 

The two model approach simplifies the GLR test by using a fixed 
length sliding window as opposed to varying length windows needed to 
implement the GLR test. Different statistical distance measures between the 
long-term and the short-level models were proposed by Appel and Brandt 
(1983) and Segen and Sanderson (1980), Basseville and Benveniste (1983, 
1986), Ishii and Iwata (1979) and Andre-Obrecht (1988). Most of these 
measures are based upon innovation testing which in turn is based upon the 
conditional distribution of the observations. 
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a . Scheme for the GLR test. 
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b . Using two models, one “global” and one local 



Figure 3.1. Schemes for 
a. GLR test 

b. Two Model Method 



3. The Statistical Local Approach 

Another approach for overcoming the drawbacks of the GLR tests is 
known as the "local approach" and has been introduced in change detection 
problems by Nikiforov (1983, 1986) for on-line detection for AR models. 

The original idea of Nikiforov consists of looking for small changes 
in AR or ARMA models and using the Taylor expansion of the log-likelihood 
function. His method results in a statistic function 




6 = 6q 
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In other words, instead of monitoring the observation process {x „} or the 
innovation process, the local approach monitors g(x). The key result (Deshays 
and Picard, 1986) is that there exists a central limit theorem for g(x), for which 
any change in 9 is reflected in a change in the mean of g(x) for which Page's 
test or the GLR tests can be used. Nikiforov derived two algorithms based 
upon cumsum tests for two different priors about the change directions. 
Different applications for these algorithms are described in Nikiforov 
(Nikiforov, 1986). Another use of these methods is in the area of recursive 
parameter identification. Benveniste (Benveniste, 1987) has shown that for 
any general recursive parameter identification algorithm 

0n = 0 n - \ + YnH n {e„-i,x„), 

where y n denotes the varying gain and H„ denotes the statistic, applying the 
local approach to the statistics H n (6o,x n ) where is a reference model, enables 
one to transform the problem of changes in the parameter vector 6 into the 
problem of detecting a change in the mean value of an asymptotically 
Gaussian distributed process which is a cumsum of the function H( ). 

Finally, Basseville (Basseville, 1987) and Benveniste (Benveniste, 
1987) introduced another use of the local approach technique. In the case of 
detecting changes in the AR part of a multivariable ARMA process having 
unknown and time varying MA coefficients. Because the Fisher information 
matrix for an ARMA process is not block diagonal with respect to the AR and 
MA parameters (because of the coupling between the unknown monitored 
parameters and the unknown changing MA parameters), neither the 
likelihood function nor its Taylor's expansion (local approach) can be used. 
By using instrumental statistics on the observations (Benveniste and 
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Basseville and Moustakides, 1987), the changes in the AR portion are reflected 
in changes in the mean of the instrumental statistics. By looking for "small" 
changes in the AR coefficients, the local approach statistic, i.e., Taylor's 
expansion of the instrumental statistics results in an % 2 test 

Hi 

ulz?u n > x 

Ho 

where U n is the instrumental statistic vector (which is asymptotically 
Gaussian) and I n is its covariance matrix. 

B. ORGANIZATION OF THIS CHAPTER 

In the introduction section, different competitive methods of Page's test 
were briefly described. Some of them enables one to detect (or estimate) the 
change time together with estimation of the changed parameters. Now, we 
will only be concerned with the quickest disorder (change) detection problem. 
In the case of implementing the log-likelihood nonlinearity, the Page test is 
the optimal (quickest detector) for the disorder problem but assumes that the 
observations are i.i.d. distributed with one distribution before the disorder 
and another distribution after the disorder. However, in the case that the 
i.i.d. assumption does not hold, other detection schemes "tuned" to the 
specific problem may perform better than the suboptimal Page test. Despite 
these concerns about Page's test, the test will be shown to detect the change 
instants occurring at random times very efficiently. This chapter focuses on 
general implementation of Page's test for both parametric and non-parametric 
detection, and evaluation of the test's performance for the implemented 
nonlinearities, by using the results in Chapter II. 
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Section C introduces the case of detecting jumps in the mean of Gaussian 
distributed observations. Both upward and downward directions are 
considered as well as the case when the change magnitude is unknown. 
Page's test implemented for this problem, (derived from the on-line point of 
view) and the GLR test (derived from the off-line point of view) are shown to 
be the same. 

In Section D, performance evaluation for Page's test is evaluated for 
different nonlinearities in Gaussian and Gauss-Gauss mixture noise 
environments. In particular we are interested in the cases where the lower 
asymptotic performance bound 77 is tight (i.e., 77 = 77). In the parametric 
framework, the problem of detecting changes in the mean and variance of 
Gaussian observations is shown to result in 77 = 77 for which the performance 
measure is easily computed. As a second example we consider a suboptimal 
implementation of Page's test where the distribution after the disorder is not 
known and by the use of composite hypothesis technique, a new test is 
derived. This local optimum detector is based on Wolcin's test (Wolcin, 1983) 
and a modification (Broder, 1990), and is modified to detect energy changes 
occurring within frequency "windows." New performance results are 
obtained and shown to be consistent with the simulation results. Finally, in 
the nonparametric framework, the sign test is analyzed by using results from 
random walk theory, and shown to have the property 77 = 77 . 

Section E summarizes the main results of this chapter. 
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C DETECTING JUMPS IN THE MEAN 

We begin in this section with the simplest application of the Page test, 
namely the problem of a change in the mean of independent identically 
distributed Gaussian random variables. This problem is an important one 
since, as will be shown in the sequel, many complicated problems involving 
abrupt changes in the eigenstructure (parameter changes) can be converted to 
the problem of change in the mean. Two cases are considered: the first, when 
the means before and after the change are known, and secondly when the 
means and therefore the change magnitude is unknown. 

1. Known Means before and after the Change 

Let {e„} be a Gaussian white noise sequence with variance a 2 , and let 
{x„} be the observation sequence such that 

x n =n„+e n n = 1, 2, ..., N 

where: 

fi o if n < v— 1 
jj i if n > v’. 

Consider now the likelihood ratio test between the "no change" hypothesis: 

H 0 : v> N 

versus the "change" hypothesis: 

Hi: v < N. 

Thus, the log-likelihood ratio between these two hypotheses has the 
following form (Basseville, 1988): 
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(3-1) 



therefore, its logarithm is given by: 

_ M\-Vo N ' 



^0+^1 



1 nL(v)-^X[r r 2 



k=v' 



1 cN 



=-^S^(^,A) (3-2) 



where A = is the magnitude of the jump, and 



S } i {n,A) = A- y £(x k -H~j ) 

k=i K z 



Replacing the unknown jump time V' by its maximum likelihood estimate 
under Hi yields: 



v = arg min 

l<v<N 



nWmK) 

U= o Jt= V 



= argminS^ (hq,A). 

1< v<N T 



Thus, we get the following change detector: 

Hi 

£n4L(v) = max S^(^o^) < ( 3 ~ 3 ) 

v H 0 



where a is a threshold properly chosen as addressed in Section C. 

This detector can be described also as follows: detection occurs the 
first time at which 

£N =Sj V (^0"4)- min Si{V0'A)>a (3-4) 

which is nothing but the Page-Hinkley stopping rule or cumsum algorithm, 
and may be computed in the following recursive manner: 
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(3-5) 



Thus, both Page's stopping rule (derived from the on-line viewpoint) and the 
generalized likelihood ratio (GLR) test (derived from the off-line viewpoint) 
are identical. The behavior of the Page-Hinkley stopping rule is depicted in 
Figure 3.2. 

2. Unknown Magnitude of Change 

In this case we may assume that no is known, but n\ is not. A 
minimum jump magnitude 4 m j n to be detected is fixed a priori. Two tests are 
running in parallel corresponding to two possible directions (increasing or 
decreasing mean). 

For detecting a decrease in the mean we determine the stopping time 
N by observing when the maxima process drops down by a, the detection 
threshold (see Figure 3.2). 



N = inf) n: max 

l \<k<n 




where 




(3-6) 



So =0. 



Similarly for detecting an increase in the mean we define 



N = inf) n : min S k - S n > a 

[ \<k<n 



where 




(3-7) 



Sq = 0 
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Figure 3.2. Page-Hinkley Stopping Rule as the Process of 
Global Minima (for a) and Global Maxima (for b) 

a. Detecting Upward Change 

b. Detecting Downward Change 



The change time v is estimated to be the last maximum time before 
detection. Similarly, the change time v is estimated to be the last minimum 
before the stopping time. Notice that this test corresponds to a linear 
transformation g(x) = x as described in Chapter II with bias terms: /J. o ± k. 
Figure 3.3 illustrates the recursive version of Page's test when ^min, the 
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unknown change magnitude is set to be 0.2, the change time is at 100 and the 
SNR of the input signal is -3dB. 

D. PERFORMANCE EVALUATION OF CUMSUM PROCEDURES 

As shown in Chapter H, we characterize the performance by the mean time 
between false alarms T, and the mean delay for detection D. The asymptotic 
ratio between log T and D was shown to be defined by (2-53) and (2-55): 

H = liirt * h{6 0 )- E{g(s)l0i} = V (3-8) 

T^> co U(a) 
a— 

where a is the threshold of the test. This relationship is influenced by two 
factors: The first is the transformation or nonlinearity g(x). The second is the 
statistical properties of the observation before the change (the root h(8o) is a 
function of the SNR) and after the change (E{g(x) I $i)). 

In the sequel, several nonlinearities are presented and analyzed in 
different noise environments. Special attention is given to these situations 
which result in equality in (3-8), namely: 

riS. i DW i = '' (eb) ’ E{ * ( * )l8l} 

>oo 

resulting in an easy way to calculate the asymptotic performance of the 
detector. 

Notice that for the cases where (3-8) is an equality, the relationship 
(logT)/D enables a comparison of Lorden bounds (2-46), (2-47) and Wald's 
bounds (2-48), (2-50) for the pair (T,D) with the correct performance measure 
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Changing Mean, Gaussian Observations 




Page Test: Changing Mean Gaussian Model 




Figure 3.3. Detecting a Change in the Mean of Gaussian Observations using 

Page's Test 
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(3-10). Both bounds are based on the root of the moment generating function 
h(Oo) and the statistics after the disorder £{#(*) I &[}. The following subsections 
present some examples for which performance curves specified by calculating 
pairs of (T,D) for many values of a, the stopping boundary, and k, the bias 
term. Thus, the use of the approximating equations for (T,D) enables us to 
find the pair ( a,k ) for a given performance requirement (T,D). 

1. Parametric Detection 

For parametric detection schemes it is assumed that the general form 
of the statistics before and after the change is known. If the parameters after 
the change are not known, composite testing techniques could be used as 
shown in the sequel. 

To illustrate the performance curves, we consider the situations 
where the noise distributions before and after the change are both Gaussian 



and also the case where both densities are Gauss-Gauss mixtures (Kassam, 
1987): 



The Gauss-Gauss mixture density is the first two terms in 
Middleton's Class A model where the noise density function is modeled by an 
infinite weighted sum of Gaussian densities with decreasing weights and 
increasing variances, and has been used to model interfering waveforms 



P(x) = - , 1 2 exp(-s 2 / 2 o 2 ) 




(3-9) 
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(pulses) and narrowband noise. The parameter c indicates the amount of 

contamination and is typically in the range (0,0.25). For small enough values 

2 

of £, the behavior of P(.v) near the origin is dominated by that of o\. For large 

values of l.vl, Cq. dominates the behavior of P(.v) since its tails decay at a 

2 

slower rate than do those of ctq. Thus, the relative strength of the 
contamination is given by the power ratio y -<7f / ctq. Adjusting the 
parameters (£,y) we can determine the performance of the cumsum 
procedures for a wide range of distributions including those with heavy tails. 

A second disorder situation results in the assumption that before the 
disorder Pq(.y) is Gaussian while after the disorder P\(x) is a Gauss-Gauss 
mixture. We consider the linear detector y(.v) = .v, and the nonlinear log- 
likelihood detector and the local optimal energy detector g(x) = .r-1. 

a. Detecting Disorder in Gaussian Measurements 

Ify(.v) is the log-likelihood nonlinearity, then it has been shown 
in Chapter II that in the limiting situation the bound is tight, i.e., = rj = rj 
where 

n= UmtSl So) 

7 — >=« U 

where 1(8\,6q) is the Kullback-Liebler number defined in (2-23). In the case of 
a change in the mean, i.e., Xj ~ N(fiQ,cr) for i < v, X,- ~ N(fii,cr) for i > v, the 
Kullback-Liebler information number is given by (Therrien, 19S9) 

t] = lint (logT /D) 

T— »=*= 

= (^) : /2o 2 (3-10) 
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with 



An = n i-//o- 

Thus the result can be directly related to the signal to noise ratio An I o. 
Notice that this result is consistent with the result obtained in (3-2) for 
detecting jumps in the mean of i.i.d. Gaussian observations using the 
nonlinearity 

g(*) = ^y(*-^0-^W 2 ) 

which results from the log-likelihood ratio test. For this nonlinearity: 

v = E{g(x)\e 1} 

= -no ~ A n/ 2) 

= (An ) 2 / 2 o 2 . 

In the case of detecting a change in the variance of zero mean 
Gaussian i.i.d. observations, the log-likelihood ratio results in a square law 
type detector and is given by 

g(x) = cx + In y 

where 

1 qf - 05 Oq 

L ~ n 0 2 ' / — - 

2 qfqo 

Notice that for detecting an upward change (y< 1), c is positive and lnyis 
negative, while for detecting a downward change (y> 1), c is negative and lny 
is positive. This explains the behavior of the Page test as illustrated in Figure 
3.5. Thus, the performance measure is given by 
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(3-11) 



V = E{g(x)W 1} 

= c af + lny 

= \[y~ 2 - i]+inr 

which is as expected the Kullback-Liebler information number for this case. 

The closed form in which the asymptotic performance measure is 
given allows one to compute easily the performance curves. 

Figure 3.4 illustrates the performance curves when detecting a 
disorder in the mean of Gaussian measurement (as illustrated by Figure 3.3) 
using the optimal nonlinearity g(x ) = ^-r {* ~ IM)- Af, f / 2) for different signal to 

noise ratios (Equation 3-10). The predicted results obtained for the delay as a 
function of a given SNR agrees with the simulation results shown in Figure 
3.3 within a tolerance of up to 10 samples. 

Figure 3.5 illustrates a changing variance Gaussian signal with 
y= 1.2 (downward change), and change time at 150. Also, the optimal Page 
test using the square law nonlinearity g(x) = cx 2 + lny applied to this signal is 
shown. Notice that in this case of y> 1, F{g(x) I $3} < 0 while E{g(x) I 6\] > 0 as 
needed. 

Figure 3.6 illustrates the performance curves for the square law 
detector nonlinearity (3-11). Notice that when — xto which means that the 

changes become undetectable, the bound given by Equation (3-11) turn to be 
noninformative since 17 — » 0. Thus, the delays obtained for values of 7 
approaching 1 are higher than those obtained for values of 7 which are 
distant from 1. Notice also the bell curve shape of the ARL function for this 
detection scheme (which is consistent with the example shown in Figure 2.9). 
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g(x)= log(fl/fO), Changing Mean, Gaussian noise 
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Figure 3.4. Performance Curves for Page's Test Implemented with the Linear 

All 

Detector gix) =-^ix-fi 0 -Aii/2) 
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Figure 3.5. Changing Variance Gaussian Signal and the Corresponding Page 
Test Implemented with the Square Law Nonlinearity 
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Figure 3.6. Performance Curve Obtained for Page's Test Implemented with 
the Square Law Detector g(x) = cx 2 +lny 
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b. Locally Optimum Energy Detector for Spectral Signatures 

Consider the case where we observe the energy spectral density of 
a signal. Under the "no change" hypothesis we assume without loss of 
generality that the background noise process {y} is normalized (i.e. cr= 1) 
White Gaussian Noise (WGN) and is grouped in disjoint blocks of M points 
for processing via the Discrete Fourier Transform (DFT). Hereby we assume 
that the sample blocks are mutually independent. The squared magnitudes of 
the M complex outputs of the DFT are computed and these random outputs 
denoted by {X,- m }, i = 1, 2, ... , m = 1, 2, ..., M where i is the block number and 

m is the frequency bin number, form the Periodogram and are available as 
the observations for the detection procedures. Namely, 



Hereby, we are interested in detecting a change within a specific frequency bin, 
while the method described here can be also used to detect a change from 
block to block as was done by Broder (Broder, 1990), thus, our method 
modifies Wolcin's method (Wolcin, 1983) by looking for a change in an 
orthogonal direction (frequency) to the direction (block) used by Wolcin's. 
Moreover, we use a narrow "window" of frequencies to detect changes within 
several frequency bins in order to detect a certain spectral signature. 

Under the white Gaussian noise assumption, the variables {X 1/m } 
except the first one m = 1 the first frequency bin, are independent and 




where 



yi,m=y( iM + m ) * = 1/2,... tn = 1,2,...,M. 



102 



identically distributed with exponential distribution and unity mean, having 
2 

the Xi distribution (Kay, 1988) 

Under the change hypothesis, the distribution of {X;, m } containing the signal 

in addition to the WGN, will also be presumed to be exponential but now 

with mean > 1. This is due to the fact that under the change hypothesis 

2 

Xi m has a noncentral %\ distribution with a noncentral parameter (Whalen, 

2 

1971) A > 0, thus the mean n of the non central Xi distribution is given by 

fj. = A+l > 1. 

Thus, if we assume that after the disorder ft >m does not depend on i, we have 

P\{Xi,m) = Vmexp{- X im/Vml 

This is the case when the signal itself is also a Gaussian signal which is 
independent of the background WGN. Hence, the original hypothesis testing 
of 

Ho- {y«} - WGN 
versus 

H] ; {y,} ~ Gaussian signal + WGN 

in the signal domain, is equivalent to the hypothesis testing in the spectral 
domain. 
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versus 



(3-12) 



^ i,m ~ Pm ®*p{ ^i,m / A*m} , — 1/2,..., TTl — 

Pm > 



Because the parameters {p m ) are not knowm a priori, Page's test with the 
optimal nonlinearity the log-likelihood ratio cannot be implemented. Thus, 
we will use composite hypothesis techniques such as the Locally Most 
Powerful (LMP) test statistic. In Chapter II we introduced the local optimum 
nonlinearity. 



8eo{x) = -^P{x;e)/ P(x;G ) 

d ^P(x;6 + AG) 
~Je P(;t;0) 



AG = 0. 



where P(x;G) denotes the observations density conditioned on the parameter 
6. This test measures small deviations from the "null" hypothesis, hence, as 
was shown in Chapter II, it maximizes the efficacy (incremental signal to 
noise ratio) of the test. 

Using this function for testing between the hypotheses p = 1 and 
p > 1 for the case of univariate exponential distributions yields the following 
nonlinearity 



toW= !,W 08 



P(xl/i > 1) 
P(x\n = 1) 



= lim — log 
juil dfj. 



/i l exp{-x/n} 
exp{-x} 



= Ui FT~(~ 1o § V-x/V+x) 

ixi\ dfi 




= x — l. 



Thus, gt 0 (x) does not depend on fj. after the change, this results in a locally 
most powerful test for all fi > 1. 

Implementing Page's test for bin number m yields 

^i,m = max jo,Sj-_] <m + 

So,m = 0 



where x(Xi, m )=X,. m -l-*: (3-13) 

where k is a positive parameter or reference value needed to bias the test for 
the null hypothesis, such that E{g(x,> ) I p' m = 1} c 0, since Page's test performs 
better when the mean of the nonlinearity before the change takes place is 
negative as opposed to zero. 

At this point it is important to notice a robustness property of this 
detector. Since the method is based upon detecting changes in the energy 
(periodogram), and since it is assumed that the disorder is independent of the 
background noise, the presence of the signal with a certain frequency 
component will increase the total energy in the corresponding frequency bin 
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which is to be detected. Hence, the underlying signal model should not 
assume a specific model for the signal. 

The performance of Page's test (3-13) is determined by the 
parameters a and k. Hence, with two degrees of freedom the test results in 
many pairs ( a,k ) that yield the same performance. The problem is to find a 
specific pair which results in a high detection probability. In order to 
determine the performance in this situation, notice that since the parameter 
H is not known a priori, the performance measure r\ cannot be determined 
since E{g(x) I #i) is not explicitly known. Thus, we shall use Lorden's bounds 
(2^16), (2-47) and Wald's bounds (2-48), (2-50) to obtain informative bounds 
for the false alarm rate. In order to obtain these bounds it is necessary to find 
the root h of the moment generating function identity (2-11) before the 
disorder (Broder, 1990). 

1 = Ejexpjfr • = lj m = 1,...,M. 

= Ejexpj/i • [x Um - 1 - k]] \n m = l} 

= exp{-/i(l + £)} • E { ex p{ hX i>m }K = 

= exp{-/i(l + k)] / (1 - h). (3 - 14) 

The root is shown to be a function of the bias term k. Figure 3.7 
illustrates this relationship. Notice that the root h does not depend on the 
DFT length. This fact will be shown to be the key to the surprising 
observation shown in the sequel that the SNR per bin does not depend on the 
DFT length. 
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Figure 3.7. The Root of the Moment Generating Function Identity (2-11) for 
gix) = x-l-k as a Function of the Bias Term k 

Notice that the root is upper bounded by h < 1. Recall that by 
(2-46) T > exp{h(6o) a}. Thus, large values of h are desired. Recall also that 77 is 
lower bounded by r\. Consequently, for a given false alarm rate, a larger 77 
corresponds to a smaller delay. Thus, from (2-55) it is clear that larger values 
of h are desired, which means that biasing the test with larger values is 
favorable. 

In order to improve the poor statistical properties of the 
periodogram (standard deviation of the order of the mean), a window of 
length W = 3 that groups the expected frequency bin and the two neighboring 
frequency bins w r as taken. Thus the statistic function g(x) was modified to 
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(3-15) 




where m t , m t+ m t + 2 , are the frequency bins used by the window. A typical 
time/ frequency sample grid is shown in Figure 3.8. 




Notice that in this case the root location depends on the window 
length since for that case the moment generating function has the form 
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i m * +2 , 

'■t H*.-,*- 1 -*) 




1 = E< 


exp< 
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II 







3 

3-h 




(3-16) 



Figure 3.9 illustrates the root location for the given window W = 3. 



Moment Generating Function, Implementing Eq.(3-15) 




bias 



Figure 3.9. Root of Moment Generating Function for 

i ™t+ 2 



m=m( 
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Notice that the root is upper bounded by 3. This implies that the mean time 
between false alarms T will be larger in this case than the previous one, since 
with the same bias level a higher root value is obtained. However, if the 
averaging of the window frequencies were done by the function 



m-mi 

the root location would be the same as in (3-14), i.e., upper bounded by 1. This 
may imply that the averaging method (3-15) performs better than the others. 

The problem is that we would like to determine the performance 
with a given pair of (a,k), but since the function g(x ) was based upon a 
suboptimal hypothesis test, only bounds (Lorden and Wald) can be derived. 
To resolve this problem the following method is presented. 

Consider that we are given the desired mean time between false 
alarm T and some minimum value for fd, say /i m in(>l) of n m for which we 
can test. In this situation. Page's test using the optimal nonlinearity, the Log- 
Likelihood Ratio (LLR) can be implemented. This results in 




(3-17) 



where 



S( x ) = (1- A)x + logA 

^ = min 



-1 

min 



(3-18) 



and Page's test (3-13) is implemented with the function (3-15) and a new 
threshold a In order to find the relationship between the pairs (a,k) and 
(a',k) of the tests (3-13) and (3-15) we will use the following analysis: 



N (o = inf ji: S,- > a}, 

N LL r = inf {»: S; > a'}. 

Hence, we obtain 



^ implemented with (3- 13) 
Si implemented with (3-15) 



N LLR ~ + Xj : r m + / 0 ^m)] — a /{^ ^-m)} 

N to = inf{i: V, + X i>m - 1 - * > «}. (3-19) 

To achieve the same performance requires that the following relationships 
will sustain 



a = a'/l-X m 

k = log[A m /(l-A m )]-l. (3-20) 



Notice now that for the log-likelihood ratio function h(6o) = 1. Thus, for the 
given average time between false alarms, T, equation (2-46) becomes 

T > exp a'. 



Hence, the following procedure can be implemented: 

• given T, the threshold a' which guarantees that requirement is given 






a' = InT, 



(3-21) 



• use (3-20) to find both the threshold a and the bias k needed for 
implementing the local optimum test given T and fd m . Hence, this test 
is now "tuned" for the desired performance. 

To summarize, this procedure allows the use of optimal nonlinearity in order 
to find the specific pair ( a,k ) needed to achieve the performance requirement 
for the energy detector. 
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A second and even more practical way to determine the test 
parameters (a,k) is by using the SNR per bin which is required to meet the 
performance requirements. Hereby, the notation S relates to the signal and N 
to the noise (in the spectrum domain). Decomposition of the data yields 
(provided that energy exists only in one of the frequency bins) 



S + N: £{ S W|9,} = ^ 

° m-mi 

3 

= 1) / h(k). 



Thus, 



SNR = - 1 





(3-22) 



Notice that k and h(k) were determined to achieve a given lower bound for T, 
thus, rj given by (3— S) determines the asymptotic ratio for the desired pair 
( T,D ). Hence, using equation (3-22) enables us to find the corresponding SNR 
per bin which is required to achieve the desired performance. Figure 3.10 
shows the SNR required per bin as a function of the bias term k for different 
values of the asymptotic measure rj. Notice that each given k corresponds to 
a certain T, thus, the corresponding delay value, D, is found from the graph by 
using the assigned rj needed for certain SNR. 

Analyzing (3-22) reveals an important result. Larger values for Tj 
correspond to a smaller delay, D, in detecting a disorder. Thus, larger values 
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Bias 



SNR 




Figure 3.10. SNR per Bin for Energy Detector as a Function of the Bias k 
Implementing Nonlinearity (3-15) 
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of k are needed. If the function (3-13) had to be used, the SNR per frequency 
bin would remain the same. Thus, using (3-15) does not improve the 
minimal SNR required per bin to achieve some level of detection probability, 
but improves the overall performance by having a lower false alarm rate. 
However if we implement nonlinearity (3-17), decomposition of the signal 
and noise yields (provided that energy exists only in one of the frequency 
bins) 

m /+2 

E{s(X,'.m)h}= I (E{X,>}-1-*) 

m=mi 

= H -l -3k 

2 



hence, the minimal SNR per frequency bin is given by 

SNR = /i - 1 



_n_ 

h(k) 



+ 3 k. 



(3-23) 



Figure 3.11 illustrates the SNR function as a function of the bias k for the 
nonlinearity (3.17). Hence, there is an SNR improvement of the order of 
l-3dB. This is a surprising result because one would expect that since the root 
for (3-17) is upper bounded by one as opposed to the root of (3-15) which is 
upper bounded by 3, the overall performance of (3-15) will be better. Thus, a 
tradeoff between the delay and the minimal SNR required for detection is 
determined by the bias k. Moreover, analyzing (3-23) reveals an important 
result. Larger values for rj correspond to lower delay and better detector 
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Figure 3.11. SNR per Bin as a Function of the Bias k 
Implementing Nonlinearity (3-17) 
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performance, thus, larger values of k are needed. But this is opposed to 
having lower values of k which are needed to obtain the required SNR per 
bin. Hence, the chosen bias term k should reflect a tradeoff between these two 
conflicting requirements. 

Simulation results were done by using the function (3-17) data 
records of length 4000 samples where the change point was at sample 2000 
(i.e., middle of the record). The Nyquist frequency used was 500Hz, and at the 
change point the transition was from 62Hz to 156Hz. We used two 
algorithms, one of which uses a 32-point DFT producing a time/frequency 
grid of (125x32) points and the other uses a 128-point DFT producing a 
time/frequency grid of (30x128) points, where the corresponding processing 
gains are 12dB and 18dB respectively. Hence, using Figure 3.10 allows one to 
predict the detection performance. An incoming signal with input SNR of 
-3dB cannot be detected by using a 32-point DFT since the output SNR is 9dB, 
which is below the minimum SNR per bin required for detection. By using a 
128-point DFT, the output SNR is 15dB, which is about 3dB above the 
minimal SNR required for detection. The same analysis done by using 
signals with input SNR of -6dB reveals that the 32-point DFT cannot detect 
the changes, while a 128-point DFT copes with the detection successfully. 
Figure 3.12 illustrates the time/ frequency grid for the case of using a 32-point 
DFT with input SNR of -6dB, while Figure 3.13 illustrates Page's test 
implemented on bins 19, 20, 21 (bin 20 being the 156Hz bin) by using a 
128-point DFT to detect energy at 156Hz with input SNR of -3dB and -6dB 
respectively. Similarly, Figure 3.14 illustrates Page's test implemented on bins 
4, 5, 6 (bin 5 being the 156Hz bin) by using a 32- point DFT with input SNR of 
-3dB and -6dB respectively. 
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128 points Dft Periodogram,Grid(30xl28) 




Figure 3.12. Typical Time/Frequency Grid of (30x128) Points. 
128 Point DFT, Input SNR = -6dB 
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Figure 3.13. Page's Test Implemented on Bins 19, 20, 21 of a 128-Point DFT 
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Figure 3.14. Page's Test Implemented on Bins 4, 5, 6 of a 32-Point DFT 
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In order to compare the detection performance of the Page test 
with a conventional detection scheme we refer to Whalen (Whalen, 1971) in 
which the performance (Receiver Operating Characteristic — ROC) for 
detecting M independent sinewave samples in white Gaussian noise by using 
a linear detector (which is the locally optimum detector for Gaussian signals, 
see Kassum, 1988), is analyzed. Even though the detection is not based on 
energy, the comparison presented in the sequel indicates better performance 
of our method. Figure 3.13 illustrates the Page detector implemented on a 
128-point DFT. For an incoming signal with SNR of -6dB the delay for 
detection is 4 blocks and the minimum SNR required for detection (using the 
proper bias value to minimize the SNR) is about 12dB for 77=10 and about 
6dB-8dB for T]= 1. The corresponding bounds for the false alarm rate are 1CH° 
and 10‘ 4 respectively. Figure 3.15 illustrates the ROC for a linear detector for 
detecting four independent samples (equivalent to delay in detection of four 
blocks) of a sinewave in white Gaussian noise (Whalen, 1971, p. 250) where 
the parameter is the SNR required for detection. For this classical detection 
scheme the ROC is in terms of Pfa versus Pd- Thus, to compare the 
performance of these two methods we refer only to values of Pp— > 1 to reflect 
that the detection is almost surely certain. Figure 3.15 illustrates that for 
values of 6dB-8dB the performance of the linear detector is very poor since 
the Pfa is in the order of lCH-lO -3 respectively, while for the Page test it is at 
least 10~ 4 . Furthermore, for the linear detector as the Pfa is lowered, at a 
given (fixed) SNR the Pd decreases, while for the Page detector, equivalently 
lower Pfa (corresponding to higher mean time between false alarms) requires 
a higher threshold and reflects in a higher delay but still, the detection is 
guaranteed. In the operating ranges of above 9dB (which is the typical 
operating range for this type of detection) the Page test is shown to have better 
performance than the conventional linear detection. 
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Figure 3.15. ROC for Detecting Sinewaves 
samples averaged). From 



in White Gaussian Noise (four 
Whalen, 1971. 
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2. Non-Parametric Detection 



Hereby we will consider only the sign detector defined as 

fl for x > 0 



$(*) = < 



-1 



for x < 0 



This nonlinearity is sometimes also referred to as random walk nonlinearity 
since the output g(x) is a random walk. Thus, results from random walk 
theory can be used. Define: 

p(0) = Pr{g(x) = H0} 
q(8) = Pr{g(x) = -110}. 

If p{6) * q{6) there is a positive probability that the process will drift to +°o if 
pW ) > q(6) (and to if p{8) < q{6)). Thus assuming that E[g(x) I 0o) < 0 yields 
pWo) < q(6 o) while assuming E[g(x) I f?i) > 0 results in p(8\) > q(6\). 

The moment generating identity is given by 

E{exp{^(0o)-gW}|0o} = p(^o)exp{/j(0 o )-(+l)} + ^(^o)exp{/i(0o)-(- 1 )} 

= 1 . 



Consider h(6o) = In 



gm 

p(6 o) 



, thus, 




In 



p{0 o) 





= p{o o)- 



o) 

p{ e o) 



+ q{6 0 )- 



p( e o) 

q(Oo) 



- p(o o) + q{ e o) - 1 - 



Hence, 



and 



h(6 0 ) = \n 



q(0o) 
v{ e o) 



E{ s (x)\e,} = p(e,)- q (e l )>o. 



The result is that the lower bound on the performance measures is given by 

V = [p{0\)-q{e i)] lo g^y. 



In order to evaluate the performance measure i), we will use results from 
random walk theory (see Karlin and Taylor, 1984, p. 109) for the 
approximation of the ARL function of Page's test as done by Broder (Broder, 
1990). 



ARL(0) = 






i- 



m 

pw. 



\ a 



q( d ) 



-a 



-1 






(3-24) 



Since under 6 q, q(6v) > p(6o), the average time between false alarms for large a 
can be approximated as 
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1 



T ~ 



q{ e o) 



[p( e o). 
P( 0 o) Kfo)) 1 
-K6b)“ 





#0) 


a 




_p{ 6 0 ). 





q(0 0 ) ^ ~ 

Ip(Oo)-^Oo )} 2 



rM 

p( e o). ' 



Under p(& i) > q(&i), hence, the average delay for large a is given by 

^ a 

D = p(0i)-<?(0i) 

hence, the performance bound 77 is given by (Broder, 1990) 

logT 



r? = lim 
a— >oo D 



log 



<?(%) 

pM 



“/[pW-qW] 



= 77- 



Using this result allows the comparison of Lorden and Wald bounds 
with the approximated results from random walk theory. 
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For the simulation results we considered the symmetric additive 
signal in noise situation, i.e. 



P{x-d) 




P(x+ 6) 



i = 0 
i = 1 



and the noise environments considered were Gaussian and a Gauss-Gauss 
mixture. In order to calculate the p(6) and q(6) parameters as a function of the 
signal and noise parameters consider the following 

Pr{«M = ±1} = Prjx J o}. 



Thus, by knowing the mean of the incoming signal we can use the 
complementary error function to derive both the Gauss and Gauss-Gauss 
mixture cases, as shown in Figure 3.16. For the Gauss-Gauss mixture 

P{6 1) = (l -f )Pl(0l) + £ Pl(0l) 
q( e i) = i-p(0i) 

for which p(6]) < q(6). It follows from the symmetric signal assumption 
(Mo = ~P\) that 

p(0 o) = #i) 

q{d 0 ) = p{6 ] ) 

which results in the desired situation of q(6o) > pWo). 
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Figure 3.16. p(6) as a Function of the Signal and Noise Parameters 
(Symmetric Case, Gauss-Gauss Mixture) 



As shown in the previous example, the root of the moment 
generating function is needed for Lorden's and Wald's approximation. Figure 
3.17 illustrates the root position for different pair values of ( p,cj ). We see that 
as p(6o) < 0.5 becomes larger, the root is smaller which indicates that for a 
given false alarm rate the delay for detection will be larger due to the fact that 
p(6o) approaches q(6o), resulting in a difficult decision situation. In the 
neighborhood where q(6 o) is slightly larger than p(6o) E{g(;r) I (?o) = 0. In this 
situation, biasing the test is needed since the root approaches zero and the 
bound p is not informative anymore. 
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root of sii»n test 




Figure 3.17. The Root of the Moment Generating Function for the Sign Test 



Figure 3.18 illustrates Lorden's, Wald's, and the random walk 
approximation (3-20) as functions of the threshold a for a certain case where 
before the disorder the difference between p(8 o) and q(8 o) is large enough. 
The results indicate good detection bounds. Figure 3.19 illustrates the same 
approximations but now when q(8 o) approaches p(8o), the degradation in 
performance is shown to be in the order of several magnitudes. The values 
for q(6 o) and p(8 o) were chosen to simulate two cases of Gauss-Gauss 
mixtures, resulting in p(&o) = 0.15, q(8o) = 0.85 for the first case, and p(8o) = 0.4, 
q(8o) = 0.6 for the second case. 
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Figure 3.18. Sign Test. Mean Time between False Alarms for 
p(6 0 ) = 0.15, q(6 0 ) = 0.85 



128 



Random Walk Nonlinearity, Before Disorder 

10 e = — ! 1 — : r 1 



H 



10 5 

10 4 

10 3 

10 2 

10 1 



Wald's 

approximation 



Xorden’s 

approximation 



random walk 
•pproximadon 

_i i 



4 5 

Threshold 



Figure 3.19. Sign Test Mean time between False Alarms for 

p(8o) = 0.4, q{8 q) = 0.6 



To analyze the delay for detection we use a similar technique, but 
since we now explore the situation after the disorder, we consider the two 
corresponding cases where p(8\) is larger than cj(6\) and where p(6\) 
approaches Cj(6\). The results are similar to those obtained in the case of the 
false alarm rate and are shown in Figures 3.20 and 3.21 in the form of 
performance curves for Page's Test implemented with the sign detector. As 
in the previous case, p(8]) and q( 6 1 ) correspond to the same Gauss-Gauss 
mixture parameters. 
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Figure 3.20. Performance Curves for the Sign Detector 
p(d 0 ) = 0.15 q(6 0 ) = 0.85 
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Figure 3.21. Performance Curves for the Sign Detector 
p(0 0 ) = 0.4 q(6 0 ) = 0.6 



E SUMMARY 

In this chapter we have described the problem of the change detection and 
of the joint estimation of the change time and the model parameters. Within 
this framework, only the problem of the quickest detection has been 
investigated by using Page's test. In the parametric framework, the linear 
detector and the square law detector were shown to be optimal in the sense of 
quickest detection of changes in the mean and variance of Gaussian 
observations. In both cases performance measures were derived and shown 
to be consistent with the actual results of simulations. A new algorithm for 
detecting changes in the spectral energy was implemented based on locally 
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optimal testing and shown to be consistent with the analytical performance 
results obtained for this test. The bias of the test was shown to reflect a 
tradeoff between the detector performance and the minimal SNR required for 
detection. Finally, the issue of non-parametric detection was investigated by 
implementing Page's test with the sign nonlinearity and testing the 
performance under Gauss and Gauss-Gauss mixture noise distributions. 
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IV. BROWNIAN MOTION APPROXIMATION TO CUMSUM 

PROCEDURES 



A. INTRODUCTION 

In sequential analysis additional simplification results from 
approximating sums of independent random variables X” =1 x,- in discrete time 
by a Brownian motion process {B(0, t > 0} in continuous time. Moreover, for 
cases where the observations do not form a Gaussian process, the discrete 
time process can be approximated by a Brownian motion process which is 
Gaussian. For further discussion on this subject see Reynolds (Reynolds, 
1975). 

To understand the motivation of the use of the Brownian motion process 
as a continuous approximation to the random walk (which describes the 
cumsum procedures), let X\, X2, ■■■ be independent and normally distributed 
with mean fj. and unit variance. If (B(f), f > 0} is a Brownian motion with 
drift n, then S n = £” =1 x,' and B(n), n = 0,1, ... have the same joint distribution. 
The analogy is clear: Brownian motion is an interpolation of the discrete 

time random walk S n which preserves the Gaussian distributions to the 
extent that a random walk process is approximately normally distributed for 
large n. Thus, the Brownian motion process may be used as an asymptotic 
approximation to a large class of random walks and hence of log-likelihood 
ratios. A good reference for a detailed discussion of this point is Siegmund 
(Siegmund, 1985). This chapter concentrates primarily on Brownian motion 
approximations to cumsum procedures (specifically the Page test). A 
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continuous Brownian motion process [B(t), t > 0} is used as an approximation 
to cumulative sums lL(g(x) ± k) which form Page's test. The original problem 
of detecting a disorder as described in Chapter II, is now modified in the sense 
that it can be viewed as a shift in the drift of a Brownian motion 
approximating a cumsum procedure. 

1. Problem Statement 

Let v be the time of shift and let fj. > 0 be the amount of shift in the 
drift of a standard Brownian motion {B(t), t > 0}, B( 0) = 0. Consider the 
observation process 

W(t) = fi(t-v) + + B(t), /i > 0. 

Thus the observation process is a Brownian process with drift 0 up to 
the point of shift v, and n after that. 

The Page test applied to Brownian motion is defined as follows: stop 
at the smallest t for which the one-sided test with boundaries 0 and a stops. 
The test is repeated if the lower boundary 0 is reached before a. Define the 
stopping role to be as 

N = inf{t: S(t)>a} 

where 

S(f) = (W(f) + kt ) - min (W(s) + ks ) 

0<s<f 

for detecting a one-sided positive shift in a drift and 

S(t) = max(W(s) + ks) - (W(t) + kt ) 
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for detecting a one-sided negative shift in a drift. The variable k is the 
reference value or the bias of the test. (Recall the fact from Chapter II that it is 
advantageous to bias the test). Hence, this procedure has two degrees of 
freedom, k and a to achieve a given desired performance. 

2. Organization of this Chapter 

The primary goal of this chapter is to analyze the performance of the 
Page test using the Brownian motion approximation, namely, the evaluation 
of the Average Run Length (ARL) function under the disorder (Delay) and 
under no disorder hypothesis (mean time between false alarms). These 
approximations will be compared with the results obtained in Chapter III, and 
a new error (bias) term which enables the "training" of the Brownian motion 
parameters (drift and variance) and Page's test parameters (k and a) will be 
presented. 

In Section B, general theory about diffusion processes and the related 
stopping time problems is presented. The first threshold crossing time and 
hitting probabilities are shown to be reduced to solving 2nd order differential 
equations. The relation to the Page test is introduced and a new bias term 
which enables the comparison of the accuracy of the calculation is presented. 

Section C deals with the approximation to the ARL functions of the 
cumsum procedure and an explicit form for the bias is calculated. 

Simulation results are presented in Section D and compared to 
simulation results presented in Chapter III. Also, a new error (bias) term 
which enables the "training" of the Brownian motion parameters and Page's 
test parameters is introduced. 

A short summary is presented in Section E. 
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B. GENERAL THEORY ABOUT DIFFUSION PROCESSES AND RELATED 



STOPPING TIME PROBLEMS 

In this section general properties of diffusion processes will be presented. 
It will be shown that many functionals, including the first threshold crossing 
time and associated probabilities, boundary behavior properties and stationary 
distributions of cumsum procedures, can be approximated by using one- 
dimensional diffusions. 

1. General Description and Definitions 

Definition (Karlin and Taylor, 1981). A continuous time parameter 
stochastic process which possesses the (strong) Markov property and for 
which the sample paths X(t) are (almost always) continuous functions of t is 
called a diffusion process. 

Consider a diffusion process (X(f), t > 0) whose state-space is an 
interval I with endpoints / < r. Such a process is said to be regular if starting 
from any point in the interior of I, any other point in the interior of I may be 
reached with non-zero probability. Henceforth, without further mention, we 
shall consider only regular diffusion processes. 

Dynkin Condition (Karlin and Taylor, 1981): A sufficient condition 
for a standard process X(0 to be a diffusion process is the Dynkin condition: 



This relation asserts that large displacements of order exceeding a 
fixed e, are very unlikely over sufficiently small time intervals. This is in fact 




hioh 



(4-1) 



for all x in I 



□ 
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a formalization of the property that the sample paths of the process are 
continuous. 

All diffusion processes are characterized by the mean and the 
variance of the infinitesimal increments. Let AX(t) be the increment in the 
process accrued over a time interval of length h, (i.e., AX(t) = X(t+h)-X(t)), 
then 

Umi£{4X(l)|X(»)- J :} = /i(x,() 

and (4-2) 

limi£{4X(f) 2 |X(() = x}=ot 2 (x,l). 

The functions fd(x,t) and o^U,/) are called the drift and diffusion 
parameters, respectively. In the time homogeneous case, the functions /j(x,t) 
= n(x) and cr^x,/) = a 2 !*) are both independent of t. 

A Brownian motion process (sometimes called the Wiener process) is 
a regular process on the state-space I with parameters nix) = 0 and cr(x) = a 2 
for all x. Adding a trend fdt to the Brownian motion B(t ) produces a 
Brownian motion with drift B(t ) + fit. In this case, the drift parameter is fd, 
while the diffusion parameter remains cr. 

The Brownian motion process (B(f), t > 0} has the following 
properties: 

• B(0) = 0. 

• (B(0, t > 0} has stationary and independent increments. 

• for every t> 0, B(f) is normally distributed with mean 0 and variance c 2 t. 
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When c = 1, the process is called the Standard Brownian motion. Notice that 
any Brownian motion can be converted to the standard process by scaling via 
B(f)/c. 

The behavior of the diffusion process X, = (X(f), t > 0} can be modeled 
by nonlinear stochastic differential equations of the form 

dX t = n(X t ,t)dt + oiX t/ t)dB h t > 0 

with initial condition Xo, where fj. and o are the drift and diffusion 
parameters as defined by (4-2) and where {Bf, t > 0} is a standard Brownian 
motion. Thus, dBt has the interpretation as a “white" noise driver. This 
notation is shorthand for the integral equation 

X, = X 0 + J^(X s ,s>is + j'<r(X s , s)dB s . 

This integral representation of a diffusion process demonstrates the Markov 
property of the diffusion. That is, given X s , for each s > 0 (X/, t > s) and 
(X/, 0 < t < s) are independent. This property is easy to see since for any t>s> 
0, we can write 

X, = X s + fr(X u ,u)du + f s < ! (X u ,u)dB u . 

This equation indicates that (Xj, t > s} can be constructed completely from X s 
and { B U/ t > u > s}. Thus, with X s fixed, {Xf, t > s} is generated independently of 
{Xf, t < s) since {Bf - B s , t > s) is independent of all the past. 

The following theorem determines the parameters of Y(t) =g[X(f)L 
where X(f) is a regular diffusion process. 

Theorem (Karlin and Taylor, 1981): Let {X(t), t > 0) be a regular diffusion 
process with parameters /i(x) and cP'ix) whose state-space is defined on I = ( l,r ). 
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Let g be a strictly monotone function on 1 with continuous second derivative 
g"(x) for l < x < r. Then Y(t) = g[X(f)] defines a regular diffusion process on / 
with the parameters 

(4-3) 



2. Stopping Time Functionals of Diffusion Processes 

In this section we analyze stopping time problems using properties of 
diffusion processes. It is assumed that {X ft), t > 0} is a regular, time 
homogeneous diffusion process. Let a and b be fixed, subject to / < b < a < r, 
and let T(z) = T z be the hitting time of z defined by 



Tz 



oo if X(f)*z Vf>0 

inf > 0; X(f) = z} otherwise. 



We use the notation 

T* = T^=T(a,l.) = miii{T(<i),T(i.)} 

to denote the first time X(f) = a or X(f) = b. For processes starting at X(0) = x in 
(a,b), this is the same as the exit time of the interval (a,b): 

T(a,b) = infjf > 0;X(f) e. ( a,b )}, X(0) = x e ( a,b ). 



a. Stopping Time Related Problems 

This section concentrates on three problems related to the first 
hitting time of a diffusion which are relevant in the case of the cumsum 
procedure. 
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Problem 1. Find 



u(x) = Pr{7» < T(b)|X(0) = xj b<x<a 



(4-4) 



that is, the probability that the process reaches a before b. 
Problem 2. Find 



z>(;c) = E{T *|X(0) = *} b<x<a 



(4-5) 



which is the mean time to reach either a or b. 



Problem 3. For a bounded and continuous function g, find 




Since the sample paths of the diffusion processes are continuous (4-1), the 



whenever the process is in state x, then A would be the total cost up to the 
time when either a or b was first reached. If g(x) = 1 for all x, then A = T*, the 
time to reach a or b, so that problem 2 can be considered as a special case of 
problem 3. 

b. Solutions of the Stopping Time Problems 



is Karlin and Taylor (1981, Ch. 15), where it is shown that u(x), v(x), and w(x) 
possess two bounded derivatives for b < x < a, and that these functions satisfy 
the following differential equations: 

Solution Equation for Problem 1 




A convenient reference for the solution of these three problems 




dx 2 dx 



140 



Solution Equation for Problem 2 



-1 = n(x)^y- + — cP'{x) 

^ Ux 2 1 dx l 

Solution Equation for Problem 3 



d 2 v 



-g{x) = n{x)^ + \o 2 (x) 



dx 2 



. d 2 w 

'77 



for b < x < a, v(b) = v(a) = 0. (4-8) 



for b < x < a, w(b) = iv(a) = 0. (4 -9) 



In order to solve these three problems we need to use several new functions. 
Let 



s(x) = exp' 



-f 



Mil 



di 



for l <x <r 



(4-10) 



be the scale density of the process. The use of an indefinite integral will 
become clear later. Next, the scale function of the process is defined by 

S(x) = j X s{ri)dri (4-11) 

and finally, the speed density is given by 

m(x) = l/[^a 2 (x)'s(x)| for l<x<r. 



Using these definitions, the solution for Problem 1, namely the probability of 
hitting a before b is given by: 



u{x) 



s (*)-s(fr) 

S(a)-S(b) 



b < x < a. 



(4-12) 



The solution for Problem 3 is given as 
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(4-13) 



w(x) = 2|u(x) • J"[S(a) - S(4)] m($)g($)dS 

+[i - U M0 S (€)- $(&)] m {S)g(S) d ^ 



The solution for Problem 2 is obtained by letting g(^) = 1. 

Notice that the solution for w(x) can also be written as: 

u, ( x )= \l G ( x 'Z)g(Z) d Z' (4-14) 



where: 



2 






2 



[S(x)-S(b)][S(q)-S({)] 

S(a)-S(b) 

[S( a )-S(Ar)][S(^)-S(i»)] 

S(‘)-S(b) 



1 

° 2 (^) 

1 



b < x < % < a 

(4-15) 

b < t, < x < a. 



The function G{x ,£) is called the Green function of the process on the interval 
[bfll 



Determining the mean time prior to T* that the process spends in 
the interval [£, l+zl) is equivalent to evaluating 



w (x) = e{L T g(X(s))ds|X(0) = x 



for 



gW = 



1 £ < x < A 



otherwise 



and following the format of (4-14), this is 



M*) = v(x) = E{^T|X(0) = x} = j: +A G(x, t l)dv (4 - 16) 
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we see from (4-16) that G(x,^)d^ measures the mean time AT prior to T* that 
the process spends in the infinitesimal interval [£,£, + dE] given by X(0) = x. 



c. Some Examples of Functional Calculations 

Given the solutions (4-12), (4-13) and (4-16), some cases of 
interest will be examined. 

(1) Standard Brownian Motion. Let {X(D, f > 0} be a standard 
Brownian motion with parameters p(x) = 0, <7 Hx) = 1. Then, 



Thus, u{x), the probability of hitting a prior to b, with initial state x, is 




The scale measure is given by 



S(x) = X. 



/ x X ~b 1 
u(x) b <x<a. 



(4-17) 



The speed density in this case is 






and the Green function (4-15) for the interval [b,a] is 




Direct calculation from (4-14) gives 
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*>' W = eMx(0) = x} = J"g(x,^{ 

= (x-b)(a-x) b<x<a. (4-18) 

Remark. A process [Xit] whose scale function is linear Six) = x, is said to be of 
natural or canonical scale since the hitting probability (4-17) is proportional to 
actual distances. 

Notice that the scale function can be used to rescale the state- 
space (/,r) in terms of probabilities of achieving various levels, and this use 
motivates the name. If a point xo is fixed as the origin, we can easily 
determine a new scale function by performing a translation, causing Six o) = 0 
and form a process Y(f) = SiXit )) on the interval (S(D, Sir)). Since S is strictly 
monotone and twice differentiable, the use of Theorem (4-3) establishes the 
infinitesimal parameters of the process (Y(0): 

My iy) = \<J 2 ix)S"ix) + mMSXx) 

and 

Oy iy) = (j^xMS'Cr)] 2 = c^ix) s 2 ix) where y = S(x). 

The scale measure for { Y(f )} process is Sy(y) = y, thus, the use of the scale 
function enables one to transform a process to a natural scale. 

(2) Brownian Motion with Drift. If (X(f), t > 0} is Brownian 
motion with nonzero drift Mix) =M an <3 variance a 2 , then: 

s(x) - exp(-2/xx/a 2 ) (4-19) 

Six) = A exp(-2/ix/cr 2 ) + B iA and B constants), 



144 



and 



e -2 yx/o 1 _ e -2 yb/o 2 
e ~2ya/o 2 _ e ~2yb/<J 2 

3. Instantaneous Return Processes and the Relation to Page's Cumsum 

Procedure 

This section introduces a certain boundary behavior of the diffusion 
process that defines an Instantaneous Return process. This process is shown 
to describe any cumsum procedure and forms the basis for the approximated 
ARL function. It enables also the derivation of a new bias term which is used 
to evaluate the accuracy of the approximation. 

cl Instantaneous Return Processes (Karlin and Taylor, 1981) 

Consider a diffusion {X (t), t > 0} on the state-space I = ( l,r ) and let 
l < b < a < r. A return process Z(f) relative to [b, a] shown in Figure 4.1 and is 
defined as follows: Starting at a point xo in (b, a), the process is returned 
instantaneously to xo whenever b or a is reached. After such a return, the 
subsequent process behaves just like X(t). This process is repeated at each 
attainment of level b or a. 

The resulting process Z(t) consists of recurrent cycles of random 
time duration T\,T 2 ,T^, where T,- are independently and identically 
distributed, with the same distribution as T a ^ = min{T„, Tb), the first exit time 
from the interval (b, a), starting from xo (stationary process). It follows from 
(4-16) that 

E{7i|X(0) = *<,} = l‘ b G(x 0 ,i)di (4-21) 



«(*) 



b < x <a. (4-20) 
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where G(.xo/£) is the Green function of the process X(f) relative to the state- 
space ( bfl ). 

Let P(t,y) be the density function of Z(t). Thus, 

p ( l >yYy = Pr{y ^ z (0 * y + <*y| z (o) = *o}- 

Define the limiting density of Z(0 as 

a(y\x)=\hnP(t,y). (4-22) 

t—>°° 



To do so, consider an interval [yi,y 2 ] such that b <y\ <y 2 < a and define the 
indicator process {I(t), t > 0) by 



j(0H 



0 



if yj < Z(t) < y 2 
otherwise. 



from Figure 4.1, we see that 

p '•{'(') = i} = £{'(')} = J” P(».yMy- (4 - 23) 



Recalling the renewal theorem (Ross, 1989) (Feller, 1971), we can deduce that 



lim Pr{/(f) = 1} = 

f— >00 



Ejtime spent in {y\,y2) in a cyde|Z(0) = Xq} 
Ejtime duration of a cycle|Z(0) = Xq} 



j“c (* 0 ,m 



(4-24) 



Using (4-23) we get 
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lim Pr{/(f) = 1} = lim \ yi p(t,y)dy 

oo f — yi 



^c(x 0 ,m 

l“ b c(x,m 



Since this holds for every y\, yi e (a, b), it follows that 



a(ylx 0 )= lim P(t,y) 

t—>° o 

G(* o>y) 



j b a G(x,{)d£ 



b < y <a. 



(4-25) 



The stationary density of the instantaneous return process a(y I xo), can be 
interpreted as the proportion of the mean time spent at state y in one cycle T,\ 




Figure 4.1. Instantaneous Return Process 
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b. Relation to the Cumsum Procedure 

Let N denote the stopping rule based on the cumsum test with 
reference value k stopping boundary a and restarting boundary b, 

N = inf{f:X(f) > a] (4-26) 

where 

X(t) = (W(t) + kt)- min(W(s) + fcs) 

0 <s<t 

is based on the observation process 

W(t) = n{t~v) + + B(t), n > 0 (4-2 7) 

where (x) + = max(0,x), and n defines the amount of shift in the drift of a 
standard Brownian motion B(t) with B(0) = 0, at the point of shift v. The 
reference value k is chosen to minimize the Delay for detection. 

Before the shift occurs, the reference value guarantees that the 
test will hit the lower boundary and cause a restart. Each restart will force the 
process to return to the initial state *o and start once again, thus, the restart 
process can be considered as causing an instantaneous return process. 

Notice that before the shift occurs, the process W(f ) is a Brownian 
motion with drift k, while after the shift (change) in drift occurs, W(f ) is a 
Brownian motion with drift /i+k. Let L be the number of restarts before the 
shift, and let {N,} be the corresponding run length intervals of the test until 
the shift is detected (i = 1, ..., L+l). Hereby, we follow the analysis as given by 
Srivastava and Wu (Srivastava and Wu, 1990). 

L L+l 

£n,<v<Xn,- 

i= 1 i=l 
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The average delay time is given by 



fL+1 



D(i)=E, £n,-(. 



1=1 



where E t (} denotes the expectation when the shift occurs at a fixed time t. 
The asymptotic average delay time or the stationary average run length is 
defined as 



We denote ARL M (xo) as the Average Run Length of the diffusion X(f) when 
the shift in the mean is fd at the initial state Xo = *o- Similarly, 
ARLo(O) = ARLo denotes the ARL under no change, namely, the ARL when 
there is no shift in the drift and the initial state is zero. Hence, ARLo is the 
mean time between false alarms (with initial state zero). 



by the restart process will be at some stationary state, say y, when the shift 
occurs. Denote the stationary density of this state y as a(y I xo). Figure 4.2 is the 
appropriate picture to guide the analysis. Suppose that we use this state y as a 
new initial state for the detecting process with shifted mean to find ARL^(y). 
Thus, the stationary average delay time (4-28) is given by 



Notice that ARL ^(xo) can be interpreted in two ways. First as a weighted 
average of ARLs under disorder over the set of all possible initial states y 
taking into account the effect of the distribution of run length before the 
disorder. Second, time-wise, ARL ^(xq) takes the weighted average of all 



ARL = lim D(t). 



(4-28) 






Under our assumptions, the instantaneous return process caused 




(4-29) 
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possible places of shift (since for each realization of X(t), each different y is 
related to a different shift time), conditioned that the shift occurred. Since 
ARL^fy) is a decreasing function of y, we obtain that 




ARL^(x 0 )<ARL #l (y). 

Since the location of the change point v is not known inside the last run 
length interval N^+i, the approximated ARL should take into account all the 
possible places of shift within the last run length interval, thus, the 
approximated stationary ARL under change (Delay) is obtained by (4-28) and 
(4-29) while the bias of the approximation can be obtained by 

bias(x 0 ,//) = ARL^ (xq)- ARL^ (x 0 )• (4-30) 

Hence, (4-30) "measures" the effect of the point of shift in the limiting 
situation for the cumsum procedure. In the following section, we will use 
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the theory of this section to derive the diffusion approximation to ARLo, 
ARL^(y) and ARL for the cumsum procedure. These approximations will be 
used to compare and measure the accuracy of the theoretical results derived 
in Chapters II and HI. 



C BROWNIAN APPROXIMATIONS TO THE ARL FUNCTIONS OF THE 
CUMSUM PROCEDURES 

The approximation to the run length functions for the one-sided Page test 
for an increase in the drift, will be obtained with the aid of the following two 
lemmas. Before presenting the lemmas, one key principle of the diffusion 
process which is relevant in our case needs to be addressed. This will be done 
in the following section. 



1. The Reflection Principle (Karlin and Taylor, 1968) 

A Brownian motion with a reflecting boundary at zero behaves as a 
standard Brownian motion in the interior of its domain (0,<»). However, 
when it reaches its zero boundary, then the sample path returns to the 
interior in a manner of that of a light wave reflection from a mirror. In 
general, consider { Z(t ), t > 0} with Z( 0) = 0 and Z(t ) > a (a > 0). Since Z (/) is 
continuous and Z(0) = 0, there exists a random time r at which Z(/) firsts 
attains the value a. For t > t, we reflect Z(t) about the line z = a to obtain 



x(0 = 



z(0 

a-[Z(ij-a\ 



for t < r 
for t > t 



(4-31) 



(see Figure (4.3). Note that X(/) < a since Z(T) > a. Because the probability law 
of the path for t < t, given X(t) - a, is symmetrical with respect to the values 



151 



x > a and x < a and independent of the history prior to time t, the reflection 
argument displays for every sample path with Z(T) > a, two sample paths X(f) 
and Z(f) with the same probability of occurrence. 




Figure 4.3. The Reflection Principle about Line a 

The following lemma establishes the fact that the Page cumsum procedure 
(X(t) given by (4.26)) with boundaries (0^) results in a Brownian motion with 
an absorbing barrier at a and a reflecting barrier at 0. The second lemma uses 
the fact that the reflecting barrier is at 0 to obtain the result that before the 
disorder, the process X(t) with a reflecting barrier at 0, can be viewed as the 
absolute value process (set a = 0 in (4-31)). Thus, the reflecting boundary 
phenomenon is equivalent to setting X(t) = I Z(t ) I . 

Lemma 1 (Bagshaw and Johnson, 1975) 

Before the shift occurs, the process X(f) given by (4-26), has the same 
probability law as a Brownian motion W(f) given by (4-27) with drift k and a 
reflecting barrier at 0. □ 
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Lemma 1 applies to any diffusion type process. It states that before the 
shift occurs, X(t) = (W(f) + kt) - mino< s </(W(s) + ks) and I W(t) I have the 
same distribution. Moreover, the distribution of the first passage time of X(t) 
to a can be determined by finding the distribution of the first passage time of a 
process with a reflecting barrier at zero to an absorbing barrier at a. Thus, it is 
clear that after the shift occurs, X(f) and lW(f)l do not have the same 
distribution (since a is an absorbing barrier). 

Using the results of lemma 1, two alternative methods can be used to 
get the desired approximation for the ARL function. The following two 
subsections describe these methods. 



2. Direct Calculation of the ARL Function via the Functional (4-8) 

Let (X(f)} be a Brownian motion on / = [0,°°) with drift \i and variance 
parameter a 2 , where 0 is a reflecting boundary. Let T a be the hitting time to 
level a > 0, and set n(x) = E{T a \ X(0) = x) for 0 < x < a. Then, v(x) is obtained by 
solving the differential equation (4-8) and is given by (Bagshaw and Johnson, 
1975) and (Karlin and Taylor, 1981): 




\i* 0 



/< = 0 



(4-32) 



where 



y= [ild 1 . 
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Recall that before the shift occurs, X(f) is a Brownian motion with drift k, and 
since Xo = 0 (Page's cumsum test), the ARL before the shift is obtained from 
(4-32) as follows by setting n = k 



1 1 
— a 

*L 2 r 



(l-e -2 * 1 ) k*0 



ARLq = ARLo(0) = 



(4-33) 



o 2 



k = 0 



where 




After the shift, X(0 is a Brownian motion with drift n+k, and since the initial 
state is given by Xq = y (see Figure 4.2), the ARL after the shift is given by 



3. Calculation of the ARL and ARL Functions using the Green 



Lemma 1 established the result that before the shift occurs, X(t) has 
the same probability law as a Brownian motion with drift k and a reflecting 
boundary 0. The following lemma uses this result to transform the reflected 
Brownian motion into another diffusion process, for which we can use 
theory established in the last section, namely, the use of the Green function to 
derive the ARL function. 




where 



y* = (ju + k)/c t 2 . 



Function 
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Lemma 2 (Karlin and Taylor, 1968) 

Let X(t) be a Brownian motion with reference value k (bias) as defined in 
(4-26). Then, before the shift occurs, X(t ) has the same probability law as the 
process |w(f)|, where VV(f) is a Brownian motion with parameters 

H^(z) = (sign z)-k 



o^(z) - o^(lzl) = constant. 

for all z in the state-space /. □ 

Thus, the reflecting barrier phenomenon is equivalent to setting 
X(f) = |w(f)|, where VV(f) is a Brownian motion on (-a, a) having parameters 
given by Lemma 2. Hence, the stopping rule (4-26) can be modified as 



N = inf{f: |w(/)| >a) 



which is the first exit time from the interval (-a, a). Thus, the reflected 
Brownian motion which describes Page's cumsum procedure is transformed 
to a nonreflected Brownian motion to which we can apply the results 
obtained for regular diffusions. 

Recall the definition of the Green function given by (4-15). Then, for 
the process X(t) in terms of the process VV(f), the scale density function (4-10) 
is given by 



-J 2 

s(z) = e 



Mil 

* 2 (?) 






= e -2lzl k/o 2 



-a < z < a. 
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Since the initial value in our case (Page's Procedure) is x = 0, we get the Green 
function for the cumsum procedure (with b = -a) for the no change 
hypothesis: 



G(0,z) = 



f° r 2\M/S du r e -2 IM/S 
2 



j a '-2\z\kl<p- 



-a < 0 < z < a 



J» C -2U k,«> du .£ 


■ e -2\ u\k/a 2 du 
■a 


> e -2Mk/^ du - 

J -a 


J. . e -2lz lt/<7 2 



-a < z < 0 < a 



Now define y=k/ o 2 

;2l2lrf mi„(o .*) e -Mr du .r- ,-n uir du 

J-a J max(z,0) 



= 2 - 



cr 2 j a e 2lu,y du 



-a < z < a 






i _ e 2(izi-fl)y 
2k 



-a <z <a. (4-35) 



This result agrees with the result shown by Srivastava and Wu (Srivastava 
and Wu, 1990) except that by (4-35) it is assumed that the process has a general 
diffusion parameter o 2 . Hence, from (4-16), ARLo is given by 

ARL 0 = £ fl G(0,z)dz 
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and a direct calculation yields the same result as given by (4-33). Using these 
results, we get from (4-25) the stationary density of the process X(t) defined by 
(4—26) is given by (y is a stationary state of the process X(t)) 



a(y\x = 0 ) = — — 0 <y <a 

\ a Q G{0,y)dy 

i- e - 2 ri a -y) 

a 

2y 

Thus, using (4-34), ARL is obtained as 
ARL/i (°) = JqARL^ (y) • cc(y\ x = 0 )dy 



a-- 



\-e~ 2a ^ 

27 



■{li + k)y* 



/o [- 



-2y(a-y) 



(a-y)2y*-e 2y * y + e 2y * a dy 



Y* 


a 2_ 1 | 


(l-(l + 2}o)r 2 ’ G ) 


. . e ~2y*a 


1- 


-e 2ya | e 2y * a -l e 2 ^ r ^ a + 2(y*-y)a-l 


[ 2y 2 




2 y 2 y* 2 (y*-y) 


(y*/y){li + k) 


2 ^° + 2)o-l 





(4-36) 



where: 



y = k/o 2 (before shift) 

y* = (fj + k) / or (after shift). 
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Having calculated ARL^fy) and ARL ^(0) yields an analytical approximation 
to the bias as defined in (4-30). 

D. RESULTS 

Using equations (4-33) and (4-36) for calculating ARLo and ARL^fO) 
respectively, the error (bias) term has been calculated via equation (4-30): 

bias (0,/i) = ARL^(O) - ARL „(0). 

For the symmetric case k = -/i/2, y= -/i/2a 2 (before change) and 'f = fi/2o 2 
(after change). The reason for this assumption is that it has been shown 
(Bagshaw and Johnson, 1975) that this is the optimal reference value if the 
objective is to minimize the ARL^ function. 

Figure 4.4 illustrates the bias term as a function of the drift. For lower 
values of the reference value k the bias term is in the order of about 10 
samples. 

Figure 4.5 illustrates the effect of the initial value on the delay of the 
cumsum procedure for an initial value of y = 5. 

Figure 4.6 illustrates the ARL function for both the delay and the mean 
time between false alarms, as obtained by using the Brownian motion 
approximations. 
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Drift 



Figure 4.4. The Bias Term as a Function of the Drift. 
Symmetric Case k = -fi/2 
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Figure 4.5. The Effect of the Initial Point y on the Delay 
Symmetric Case k - -fi/2 
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E SUMMARY 



In this chapter an additional viewpoint to the analysis of cumsum 
procedures was introduced by using the Brownian motion approximations 
for stopping times. The problem of determining the probability, the average, 
and some general cost function of the stopping time was shown to be reduced 
to a closed form. 

Next, the behavior of the diffusion process was investigated for two cases. 
In the first case, the cumsum test was shown to be modeled as a diffusion 
instantaneous return process which enabled the derivation of the stationary 
density of the diffusion, thus representing the density of the cumsum process. 
In the second case, the behavior of the diffusion process near a reflecting 
boundary was investigated and shown to be the key to determining the 
approximation of the ARL function for cumsum procedures. Finally, a new 
error (bias) term was developed allowing one to predict the average error in 
the delay for detection. Also, a new procedure of “tuning" the diffusion 
parameters to a given problem was introduced. The drift parameter was 
shown (as expected) to be the most influential parameter for the 
approximation. 
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V. QUICKEST DISORDER DETECTION METHODS: THE BAYESIAN 

FRAMEWORK 



A. INTRODUCTION 

Consider once again the disorder formulation of (1-2), where the 
observations x\, xi, ... are i.i.d. random variables, such that up to a certain time 
v > 1, X\, ..., x v -\ are identically distributed with distribution Po(x), while x v , 
X v +i, are identically distributed with another distribution P\(x), where Po(x) 
and P\(x) do not depend on v. In the non-Bayesian formulation the random 
time v is considered as a parameter, and this formulation leads to classical 
problems of hypothesis testing. By the Bayesian approach, the parameter v is 
considered as a random variable with a certain distribution. As in the non- 
Bayesian approach, we shall be concerned mainly with the problem of how to 
use the observations to determine as quickly as possible the time v, or the 
"disorder" situation, for a given false alarm ratio. Shiryayev (Shiryayev, 1978) 
and Roberts (Roberts, 1966) independently proposed an approach similar to 
cumsum procedures. We shall refer only to Shiryayev and use his notation. 

Shiryayev solved the problem of quickest disorder detection subject to a 
constraint on the probability of false alarms Pr{N < v) < a for all v (where N is 
the stopping time) in the Bayesian framework. 

The following section gives a short presentation of his work and some 
important results which will be used next to establish new results. 

This chapter is organized as follows: In Section B we introduce 

Shiryayev's results which are relevant to our case, and form the basic 
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underlying observation process which is used to solve the Bayes version of 
the cumsum procedures. Section C presents a new approach to evaluate the 
performance of the Bayes version for cumsum procedures. The analysis is 
based on the Shiryayev optimal Bayes solution (Shiryayev, 1978) and the 
modification of the double procedure algorithm of Assaf and Ritov (Assaf 
and Ritov, 1988) and uses Brownian motion approximations to solve the 
Bayes problem. 

Finally, Section D contains a short summary of the results. 

B. BAYESIAN APPROACH TO CUMSUM PROCEDURES APPROXIMATED 

BY BROWNIAN MOTION 

1. Problem Formulation 

As mentioned in the introduction, we will follow the work done by 
Shiryayev (Shiryayev, 1978), thereby, a new derivation of the performance of 
the cumsum procedure will be introduced in the Bayesian framework, using 
some of Shiryayev's results. The problem will be presented in terms of a 
Brownian motion process which approximates the cumsum behavior (see 
Chapter IV). 

Consider a Brownian motion process {W(0, t > 0}, which during the 
time interval [0,v] has zero drift, and during (v,«>) has drift n> 0, where v<<» 
and n are unknown parameters. The process W(f) satisfies the stochastic 
differential equation 

dW t = /d(t - v) + dt + a dB t , iv 0 = 0 
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where (a) + = max(O^j) and B(t) is a standard Brownian motion with B(0) = 0. 
In other words, the structure of the observed process is 

oB t t < v 



W t = 



p(t - v)+oB t 



t> v 



(5-1) 



where v is considered as the (unknown) "disorder" time in which a disorder 
takes place in the observed process, and the local drift shifts from zero to /i. 

In what follows we assume that v is a non negative random variable with 
a priori distribution 

Pr{ v = 0} = pq, Pr{ v > f | v > 0} = (5-2) 

where po and X are known constants. Let N be the stopping variable which 
defines a certain class of detection rules (p. The class (p of those solution rules 
for which N e <p is finite with probability one, is denoted by A. 

For every <pe A, let 

R(<P,p 0 ) = Pr {N < v} + c • E{(N - v) + } 

be the risk consisting of the probability of a false alarm Pr{N < v) and the 
average delay of detecting the disorder correctly, E{N-v| N > v}. The cost of 
one observation is assumed to be c > 0. Thus, the cost of the false alarm 
compared to the cost of the delay in detection is determined by the value of c. 
Define 

p(N-)= inf R(<p,p 0 ). (5-3) 

<peA 



where N* is the optimum stopping rule which minimizes the cost function. 
Hence, the problem can be slightly changed, i.e., to find among all the rules 
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<P € A with a given probability of false alarm a = Pr{N < v), a rule N*e <p 
which guarantees the minimum of the mean time of delay, if the detection 
was correctly done, i.e., such that 

D(a) = inf E{N * - vjN* > v} (5 - 4) 

4>eA 

where N* is called the Bayes time. Thus, the Bayes problem of quickest 
detection can be formulated in the following way: 

For a given false alarm probability a = Pr{N < v), find the observation 
method with the minimum average delay, (which minimizes the risk (5-3)). 
The following theorem establishes the optimum observation method in the 
class of decision functions <p <£ A. 

2. The Optimum Bayes Solution 

Theorem (Shiryayev, 1978): For a given false alarm probability 

Pr{N < v) < a < 1, the optimum observation method for the problem of 
minimizing the average delay as defined by (5-4) consists of observing the 
process 

Z, = Pr{v<f|W s , s<f} (5-5) 

with the initial condition Zq = po < 1 and deciding that a disorder is present 
when a threshold a < 1 is first attained. Hence the stopping rule is given by 

N = inf{f: t > 0, Z t > a) 

where a = 1 -a. The process Z t satisfies the following differential equation: 

dZ t = X(l-Z t )dt + ^Z t (l-Z t )dB t Z 0 = z. (5-6) 

<7 

□ 
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For the proof of optimality of N, see Shiryayev (Shiryayev, 1978, 
Ch. 4). The theorem gives two important results: First, the structure of 

optimal Bayes time N* solutions consists of observing the current posteriori 
probability that the change has already occurred. The process W t is observed 
until the process Z t reaches (at time N*) for the first time a certain level a. 
Second, Z t is a diffusion process with time homogeneous coefficients given by 



where p and o are the time homogeneous coefficients of the observation 
process (5-1). Notice that when A — » 0, i.e., when the mean time at which the 
disorder occurs E{v} = A -1 tends to infinity, hence p{z) = 0 and the diffusion Z t 
has a zero drift. Notice also that in this case it is natural to assume that a — » 1. 
This situation indicates that the disorder appears on the background of an 
established stationary regime. Shiryayev solved the problem of quickest 
detection under this assumption. For a given mean time between false 
alarms T, under the optimal method ot observation, the mean delay time 
D(T) is given by 



M z ) = A(l-z) 

<7 2 (z) = [(m/ct)-z(1-2)] 2 



(5-7) 



D(T)=E{N-\\N > v} 



= i{log(yT)-l-C} 



(5-8) 



where 



y-p 1 ! 2cr 



C = 0.577...= Euler constant 



T = ( 1 - a) / A 
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Notice that the assumption that the disorder is preceded by a long process of 
observation in which a stationary regime is established implies that A — » 0, 
a — » 1, but such that T = (1 -a)/X is fixed. 

The results established in this section, in particular the diffusion type 
behavior of the optimal observation method, namely, the posterior 
probability of change, motivates a new formulation of the quickest detection 
of cumsum procedures and will be presented in the next section. The analysis 
will make use of the theory of diffusion processes established in Chapter IV. 

C THE BAYES SOLUTION TO CUMSUM PROCEDURES 

The framework set by Shiryayev enables a convenient formulation of 
quickest detection problem for cumsum in the Bayesian framework. Hereby, 
there analysis of Shiryayev (Shiryayev, 1978) and Assaf and Ritov (Assaf and 
Ritov, 1988) is modified to obtain a new performance analysis of the optimal 
Bayesian stopping time solution for cumsum procedures. 

1. Problem Formulation 

The behavior of cumsum procedures as processes which exhibit 
renewal properties (Chapter II) and which can be described by instantaneous 
return processes (Chapter IV) establishes the observation that for a general 
cumsum procedure, the process of local minima (or local maxima) results in 
regimes (i.e. periods between successive local minima points) in which the 
diffusion approximation has a certain drift. The disorder occurs in one of the 
regimes, where the diffusion approximation will exhibit a change in the drift. 
The problem of quickest detection is concerned with the minimization of the 
average number of bad regimes which are mistakenly accepted during one 
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cycle, i.e., between two successive alarms. Notice that if the disorder occurred 
in the last regime in the cycle, then the average delay is given by ARL M (xo ) as 
defined in (4-29). Let L be the number of regimes in one cycle. Thus, the first 
L-l regimes are accepted, each time the present regime is accepted the test 
continues to the next regime, while the last one is rejected and produces the 
alarm. Let X,- be the set of observations within the regimes, i.e., Xj denotes the 
observation set within regime 1, etc. Assume that the (true) change occurred 
in regime v. Thus, Xo, Xj, ..., X v _i are independently distributed according to 
some Pq while x Vf x v +i, — / are independently distributed to some Pi- 
Assumption 1: Both Pq and P i are the normal distributions with known 
means po and p] and common variance <T 2 . 

Assumption 2: It is assumed that the change occurs only between regimes 
and not within a regime. This assumption can be justified by the fact that by 
using the ladder variable approach it was shown in Chapter II that the process 
of local minima reflects the set of time instants which are more likely to be 
the change points. Moreover, it was shown (2-33) that the actual number of 
regimes within a cycle is geometrically distributed. Following Assumption 2 
we establish the last assumption. 

Assumption 3: The change regime v has a prior which is geometrically 
distributed with a known parameter 0 < p < 1, i.e., Pr{v = n) = p ■q TX ~' 1 for n > 1. 

Let L be a stopping time for declaring a change. Let a = Pr{L < v} be the 
probability of false alarm and let D = E{L-v} + be the expected number of 
regimes which are mistakenly accepted in one cycle (i.e., which are 
mistakenly identified as regimes containing “no change" information). 
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Consider the following version of the optimal problem: find a 

stopping time L which minimizes D subject to the constraints a and ARLo(-), 
where a is a given probability of false alarm and ARLo(-) is the mean time 
between false alarms defined in (4-29) 

2. Optimal Bayes Solution 

The solution of the optimal Bayes problem is given by (5-5) and is 
denoted as the Z process. For any regime, Z t is the "current" posterior 
probability that the change has already occurred given the first t regimes. 

Z, = Pr{v < l\X 0 ,X lf ...,X e } t = 0,1,2 (5-9) 

Due to Shiryayev results, (5-9) is defined as the observed process which 
behaves like a diffusion process with time homogeneous coefficients given by 

H{z) = 0 

„ , (5-10) 

cr(z) = [(/ 1/i / <x)z(l - z)] 2 

where Afi = 

Notice that the underlying model assumes that within a regime the cumsum 
behaves like a Brownian motion with drift parameter fio or fi\ and variance 
parameter ex 2 . The corresponding observed process (5-9) has zero drift 
parameter due to the fact that it is assumed that the change does not occur 
within the regime. Notice that this assumption results in a natural scaled 
diffusion whose scale function is linear, S(z) = z. 

The observed Z / process is defined on the state-space I = (0,1). Let 
0 < b < a < 1. The optimal stopping rule is defined as follows: 

• accept the present regime and move to the next one whenever Z/<b. 
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• continue sampling within the present regime as long as b < Z/< a. 

• reject the present regime and declare a disorder as soon as Z/>a. 

Thus, the stopping regime is given by 

L = inf{£: Z/> a). 

The initial value of Z at the first regime is zo = p while a decision to accept a 
certain regime and to move to a next one results in an initial condition 

zq = b + p(l - b). (5-11) 

This result is due to the fact that for any regime in the cycle except the last one 
the test is terminated at the lower boundary, Z/= b 0 < l < L- 1. We obtain 
(5-11) by using the law of total probability. All the regimes following the first 
one have the same probabilistic behavior. Thus, their initial z values are 
given by (5-11). See Figure 5.1 for a pictorial illustration. 

3. Cumsum Performance Analysis 

The goal of this section is to find the relationship between the test 
parameters a, b, and p and the delay D and the probability of false alarm a. To 
start the analysis we need to find E{L), the average number of regimes within 
a cycle. Note that L is modified geometrically distributed since L is a mixture 
of two random variables, the first of which is identically zero and the second 
of which is geometric (Assaf and Ritov, 1988). Hence, the probability of 
success p should be calculated when the initial value is zo and the probability 
of failure q should be calculated with initial value zq. 
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E{L) . 3 

Pr{z(n) = fc} 

Pr{z(n) = a] 

Prjhitting b before ajinitial regime value = zq 
P rjhitting a before ^initial regime value = zq ■ 

Recall the results obtained for a general diffusion process Z * for solving for 
the probability of hitting the boundary a before b as given by equation (4-4). 
The solution is given by (4-12). 

u( z 0 ) = Pr{T(fl) < T(b) | Z(0) = z 0 ) b < z 0 < a, 

hence, E[L) can be obtained by 
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(5-12) 



£{L} = 



i~“( z o) 

#o) 



Since the observed diffusion Z t has zero drift coefficient within any regime, 
the Z process has a natural scale function S(z) = z. Thus, using (4-17) we 
obtain 



u(z o) = 



zp —b 
a-b 



b <z 0 < a 



_ ZQ-b 

u ( z o)) = ~^Zf b <zq< a 

which results in 

E{L } = (a-z Q )/(z 0 -b). 

Using z 0 = b + pd-b), the expected number of regimes per cycle is given by 

EIL} = (a-z 0 )/p(l-b). (5-13) 

Having derived an explicit form for the of average number of 
regimes per cycle £{f}, enables one to show the relationship between a 
(probability of false alarms) and D (delay) with the test parameters a, b, and p. 

To compute a, notice that when the observed process 
Zi = Pr{v < L | Xo, ..., X/) crosses the upper boundary a and causes an alarm, then 
Zi = a or Pr{v < t \ Xo, ..., X;) = a, thus it follows that 1 -a = a, or 

a = 1 -a. (5-14) 

To compute D = E{L-v) + , notice that Z* is the expected value, using posterior 
information of the indicator function / (v <*) (Shiryayev, 1978), i.e. 
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1 



V<£ 



hence 



hv<t) = 



[0 



v > £ 



Pr {'(«<) = l|*0 X , } = Pr{ v £ t\X 0 „ . . X, } 



= Z, 



Thus, 



D = E{L - v} + 

L r i 

= Ie{'(vS/)|X 0 --X/} 



= E 



L 1 
1 2 / 



U=i J 



(5-15) 



We obtain (5-15) which is consistent with Shiryayev's result, but here the 
derivation is done in a much simpler way. Since for 0 < £ < L - 1 the 
Z process terminates by crossing the lower boundary, thus 
Z/ = b for £ = 0, ..., L- 1. Hence, 
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D = E 



L 

I*/ 

*=1 




izjt 

U=1 



= E{L£{Z,}} 

= E{L}-b 

= (a-z 0 )b/p(l-fc). 



(5-16) 



4. Asymptotic Analysis 

In this section we are concerned with the asymptotic analysis of the 
delay D as p — > 0. Since 1 /p determines the average rate of changes, this 
asymptotic analysis will indicate the performance of the optimal algorithm 
when the rate of changes is small. Hereby we shall consider the constrained 
version of minimizing D subject to given values of Pr{v < L} = a and 
ARLo(zo) = T, the regime time. 

The analysis starts with computation of the average cycle time E{C } 
which is needed to analyze the asymptotic average delay. The diffusion type 
behavior of the observed process Z* enables the use of techniques introduced 
in Chapter IV to obtain the result for E{C). Finally, we obtain an asymptotic 
approximation for the average delay. 

cl Calculation of the Mean Cycle Time E(C) 

Figure 5.1 is the appropriate picture to guide the following 
analysis. To compute the expected cycle time, E{C), notice that the first regime 
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run length in a cycle starts with initial condition zo, while all the following 
regimes start with initial condition 2o given by (5-11). Let Nj be the sampling 
time of the first regime in a cycle, then E{N\) is the expected time it takes the 
diffusion to reach b or a. Similarly, let N be the time needed for the diffusion 
starting at to reach b or a. The total run length of a cycle is given by 
C = where L is the stopping regime and with N\, N 2 , independent 

and identically distributed like N . Applying Wald's equation and using 
(5-13) we obtain 



To make the computation simpler we consider the long run 
situation, using the simplification zq =zq. In this case (5-17) becomes 



The last result is due to the fact that E{ N } is the average regime time which 
is by definition equal to ARLo(zo) since within the regime the drift coefficient 
is zero. 

b. Calculation of ARLq(z) 



E{C} = E{N 0 } + (E{£}-1)-E{n} 




(5-17) 



£{C} = E{L)e{n} 

= [( a - 2 o)/p(l-!’)]' E {N} 

= [( a - 2o )/p(l-i>)]-ARL o (2 0 ). 



(5-18) 



The observed diffusion Z* is in natural scale since the scale 
measure is linear (see 4-17), i.e.. 



S(z) = z. 
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Using the expression of the variance coefficient (5-7) we obtain from (4-15) 
the Green function for the observed diffusion Z/ 

(z-bp-S) 






(^/<r) 2 (a-b)£ 2 (l-£) 

(t-b){a-z) 

{A^i / cf{a-b)t > 1 {\-^) 



2 b <z < £, < a 

2 b < £ < z < a. 



The Average Run Length ARLo(z) is given 
(Assaf and Ritov, 1988), 

ARL 0 (z) = e{n] 

= j“G(z,Z)dt 

1 

(Afi /2<j) 2 (a-b) 



(z-fc)(2fl-l)log 




+ (fl-z)(l-2b)log' 



z ( ] - b ) 



For the limiting situation 

lim z n = lim {b + p(l - b)\ — > b 

p-» 0 p— >0 l J 



(5-19) 



it follows that in this situation ARLo(b) — » 0 (as anticipated). Thus, it follows 
that for the constraint ARLo(z) = T to be satisfied, we need b — » 0 resulting in 
0/0 situation. 
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lim ARLo(zq) = lim 

p — >0 



1 



£*0(Ati/2o) ( a-b ) 



(a-b-p{l-b)){l-2b)\og 



\ (b + p{l-b)){l-b) 
| b(l-b-p{l-b)) 



= lim 7 T- 

p-rt(Ap/2ofa 



[a log(l + p/b)] 



1 

(Ap / 2<t) 2 



log(l + p/b). 



Notice that ARLo(zo) approaches in the limit to a finite value. 



(5-20) 



c. Asymptotic Delay 

For the limiting situation we also obtain the following 
approximations: 



lim E{L} = lim (1 - z 0 ) / p( 1 - b ) 

z — >0 z-»0 

b-> 0 b-> 0 



= a/p 



= (l-a)/p 


(5-21) 


and 




lim E{C} = (1 - a) ■ ARLq(O) / p 

z— >0 
b-»0 


(5-22) 


and for the constraint ARLo(O) = T we need 




p/b = e T( - A rf /2 ° 2 -1 




^ e T(At,) 2 /2a 2 


(5-23) 



Substituting (5-23) in equation (5-16) for D, we obtain 
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limD = ab / p 

Z 0-»0 

= (1 - a)b / p 

= (1 -a)/^e T ( A rf /2<j2 ^ (5-24) 

Hence, the asymptotic average delay is given in terms of the constraints a and 
T and the signal parameters A/i = P\-Pq and cr 2 . 

Since the ratio Ap/ o can describe a measure for signal to noise 
ratio, the average delay (5-24) can also be described as 

D = (l-«)/(e <T/2XSNR > 2 ) (5-25) 

Notice that in the limiting situation the average delay does not depend on p. 
Once again, as for Shiryayev's result (5-8), as p — » 0, a — » 1, and the delay D 
approaches the limit to a finite value. 

D. SUMMARY 

The fact the cumsum procedure can be viewed as a process of local 
minima (or respectively, maxima) enabled the use of the Brownian motion 
approixmation to the optimal observation process Z/. With the aid of these 
tools, ARLo(-) given by equation (5-20) and the asymptotic delay (5-25) were 
derived and shown to reach finite values. 
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VI. DETECTION-ESTIMATION ALGORITHM FOR NOISY DATA WITH 
ABRUPT CHANGES (DISCONTINUITIES) MODELED BY THE PIECEWISE 

STATE-SPACE MODEL 



A. INTRODUCTION 

Until now, all of the chapters dealt with problems of disorder as defined 
for Types 1, 2, and 3 (see Chapter 1). In this chapter we present a Type 4 
problem, namely, an initial condition disruption problem. The use of state- 
space models as descriptive models for the initial condition disruption allows 
the joint estimation of the change time v and the state-space parameter 
representing the observed signal. This methods seems to be efficient 
compared to GLR methods for certain classes of problems since the Kalman 
filter gains and covariance matrix can be computed off-line if the state-space 
matrices do not change in time. However, this is not the case for AR or 
ARMA modeling in the state-space format. 

The problem of detection-estimation or detection-smoothing of signals 
with time-varying statistical characteristics is of great interest in many areas of 
signal processing. In many cases, prior knowledge of the signal characteristics 
can be used to model (using model-based techniques) the non-stationary 
behavior. In this section, the statistical changes are modeled by piecewise 
deterministic state-space equations with random initial conditions, and 
measurements corrupted by additive Gaussian white noise (Cristi, 1988). A 
particularly interesting class is the case of signals representable by Auto- 
Regressive models with piecewise constant coefficients (Andre-Obrecht, 1988). 
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Also, the class of PSK (Phase Shift Keying) signals enters this category, where 
the phase of the sinusoidal carrier is shifted according to the information 
(Point, 1987). For the PSK the phase shift of the sinusoidal carrier can be 
modeled by change of initial condition of a state-space model that describes 
the sinusoid. The goal is a non-coherent detection scheme that will recover 
the piecewise constant phase. 

For such classes of signal models, we can approach the combined 
detection-estimation problem as a combination of: a) detection of the 
transition points, in order to segment the data field into compact regions 
having similar characteristics (for example constant phase in the PSK signal), 
and b) filtering within the regions to reconstruct the original signal. In the 
estimation framework the joint estimation of the change time and the model 
parameters can be achieved. 

Previous works (Cristi, 1990) used techniques based upon the 
combination of Markov Random Fields (MRF) models, with Recursive Least 
Squares (RLS) algorithms in order to estimate the model parameters within 
the regions for ID or 2D fields. Another approach (Point, 1987) used 
Kalman filtering techniques in order to estimate the change instants in PSK 
signals. Hereby we present a new technique based on Kalman filtering 
techniques, which calculates the joint distribution of the measurements 
and the change process (defined as the transition process) over a finite 
length window. The approach presented in this section is based upon a 
Maximum a-Posteriori (MAP) framework (Cristi and Aviv, 1991). The signal 
of interest is described by a piece wise state-space modeling, with initial 
conditions set at the beginning of each interval. By applying the Kalman 
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filtering technique, the two tasks of segmentation and filtering over the 
segmented regions can be achieved. The method leads to hypothesis testing. 
Moreover, if the state-space matrices do not change in time, the algorithm can 
be implemented at a low computation cost. 

This chapter is organized as follows: Section B presents the model and 
the assumptions used to describe the prior needed for the algorithm. Section 
C presents the algorithm derivation, and finally. Section D presents 
simulation results. A short summary is given in Section E. 

B. MODEL DESCRIPTION 
1. Problem Statement 

Consider the state-space model 



where x IC (n) is the initial condition vector at instant n, and A,B are known 
matrices, c is a known vector, v(n) and w(n) are i.i.d. white Gaussian drivers 
with zero mean and known covariance Q and cr 2 . 



Notice the doubly stochastic nature of the process {x}. In this respect the 
process can be described as a combination of two models: one, modeling the 
regions corresponding to the initial condition, and one for state-space model 
itself. 




Ax(n) + Bv(n) no change 

x JC (n) change in initial conditions. 



y(n) = cx(n) + w{n) 



( 6 - 1 ) 



v - N( 0,Q) 



w ~ N(0 ,c?). 
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Let 7= {yin), n > 0}, yin)e {0,1} be defined as the process of transitions, 



i.e., 



0 




if x(h+1)= A x(n) + v(n) 
if x(n+l) = X /c 



2. Model Assiunptions 

Assumptions on x/c(n) and y(n) are needed. 

(1) Assumptions on xjc(n) are independence 

P(xjcin) | x/ C (n-l) ... x /c (0)) = P(x IC (n)) 
and that Pixjcin )) is Gaussian with known mean and variance. 

Pix IC (n)) = N(x_!, P.iX 

where x_i is the a priori mean and P_i is the covariance matrix of the vector 
X/c- 

(2) Assumptions on the transition process yin) are 

• There exists an integer d such that at most only one transition occurs in 
the process /during any interval [ f -d,t]. 

• The process y is assumed to be ^-Markov, in the sense that 

JW) I yit-1 ) ... m) = Piyit) I yit-1 ) ... yit-d)) 

for all t > d, which implies that the statistics of y are known from the 
last d samples. 

3. Probabilistic Model for the Transition Process yin) 

In order to assign a probability measure to y, define the following 
"truncated" sequence: 

Y(4+i = tyt-d ) ... )<0] 
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where the pair (t,d+ 1) defines the truncation boundaries, t defines the time 
index of the starting element, and d + 1 defines the sequence length. In a 
similar way we can define any other truncated sequence. 

The vector of possible realizations of yis defined by 

ej = [fy(0) . . . £j(d)} 7 e {0,l} d+1 j = -1, ..., d 



where 



and 




ifi = ; 
if i * j 



i = 0 ,...,d 
j = l,...,d 



e 



-l 



= 0 . 



The possible d+2 realizations of the vector Ey are of the form that at most only 
a single can be present at any i location (0 < i < d) corresponding to a 
change at location i. Thus, e s results in a change inside the window Ym+i at 
location (t-s), while e_i is by definition the no change vector. 

From the assumption of y being a Markov process with realizations 
Ey, we can determine its probabilistic model as 



pOO) = oIy ( _ m+ i) 



'1 if Yf-U+1 =£; j*~ 1 

,Po if Yi-w+i = £-i 



( 6 - 2 ) 



The reason behind this equation is the fact that }<f) = 0 with probability 1 if a 
transition exists in the interval [t-d- 1, 1-1] which defines the previous 
"sliding window." If there was no change in the previous window 
(Y<-M+i =£-i ), then )<f) = 0 with probability Pq, thus y(t) = 1 with probability 
Pi = 1 -Pq. Figure 6-1 shows realizations of the process y. 
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C THE DETECTION-ESTIMATION RECURSIVE ALGORITHM 

In this section the detection-estimation algorithm will be presented based 
on a Maximum a Posteriori (MAP) probability approach in order to extract the 
transition process from the observations {y} and estimate the process {*}. 
Notice that the transition process [y\ defines the regions of the same 
probabilistic nature (constant phase in the case of PSK signals), resulting in 
the segmentation task. Within this framework, the transition points t are 
indicated by the process and they correspond to y{t) = 1. The algorithm 
presented here is based on a "sliding window" of length d + 1 over which the 
likelihood of the transitions is recursively computed using the following 
lemma 1. 

1. Basic Lemma 

Lemma 1. Define Y t = [y(0) ..., y(0] 

Y t = W ) ..., 'jit)] 

Yt 4 = [yit-d-1) y(f)] 

and similarly: 

^14 + 1 = ly(t-d) •••/ y( 0 ] 
jt4+\ = i'jit-d) ■■■, yit)], 

Yt-u+i = [y(f-^-D -vy(f-D] 

Y t -\4 = [y(t-d ) ..., y(f-l)] 



then 
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^(X^i+1' 'Yt-d- 1 ) 



p(y(i)lY<-i.Ti,^i)p(y(<)lTt-i.^i,Y l -d-i) 
p(y(i - d - 1)1 V-i-i.yt-i-i ) p(r(i - d - I)|f ,- 4 - 2 ,y,-d-,) x 

pfr-wyt-u+Pt-j-i.it-d-i) (6-3) 

Proof: By Bayesian factorization: 

P (^t,d+1'yt,d+^t-d-\'yt-d-\) - 

= ^( y m+iI y t,d+'i'^t-d-\>y t-d-'i)’ piytj+il^t-d-vy t-d-i) ( 6_ 4 ) 

The left-most probability term can be further factored: 

p (\d+\\ Yu+l' Y <-rf-hT/-rf-l) = 

p[y{ Ol Y <-i;Y<^ + i>T^-i)- p ( Y <-i;Yf^ + i;Yf-i-i) _ 

^(Yf ,tf+] /' / Yi-d-i ) 

= p(y(0|VhYu+i;Yf-rf-i)x 

lXt-rf-2 ; ; Yt-rf-i ) • p( Vf-rf-2 ; Y<,rf+i ; ) 

p (y{ 1 - d - 1 )|' ^t-d -2 • Yf ,d +\ ; Yf-d -1 ) • Pftt-d- 1 ’ Yu+l ; Yf-d-i ) 

because of the Markov property of the yft) process, it is clear that in the 
conditioned probability terms, once Ym+i is known, then, Yf_^_i is redundant. 
Furthermore, since Yi-M+i is independent of y(t), the conditioned probability 
term in the numerator becomes: 

p ( Y t-l,d+l | Y t-d-2 ; Yf-i,rf )• 
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Notice also that in the denominator y(t-d-l) is independent of yt,d+i, hence, 
the left-most expression in (6-4) becomes: 



P(y(f - rf - 1)| V t _ d _ 2 ;f 



(6-5) 



The right-most probability term in (6-4) can be also further factored: 



p(y t ,d+Pt-d-Y’it-d-\) = 

p(v t - d -vyt-d-\) 



y P(y t -d-2’^t-d-\) 

P(%-d- 



p(r{t)\y t -\,diyt-d-v^t-d-\) 
p(r(t-d- qy t -d- 2 *t-d-\) 



• P(yt-U' r{t-d-^ t -d-2^t-d-\) 



Therefore, inserting (6-5) and (6-6) into (6-4) yields the desired recursion 
(6-3). □ 

By the recursion (6-3) we can update the statistics of the processes (y) 
and {y} over the interval [t-d,t] conditioned on past values of these processes. 



2. Likelihood Function of the Transition Process 

Using the probability model of the last section, the transition process 
{)} can be estimated at time (t-d), i.e., the edge of the window, on the basis of 
the observations up to time t, (i.e., using the data within the window [t-d,t]), 
by using the Kalman filter technique. 

The rationale behind this can be explained as follows: Suppose there 
was no (true) change in the signal's model. If the initial condition of the filter 
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is changed at any time instant within the interval [t-d,t], then there will be a 
probabilistic mismatch between the true signal and the estimated one (See 
Figure 6-2a). Suppose now that the initial condition is changed at the same 
time instant the true change occurred, then, by forcing the change to be 
evaluated at the edge of the sliding wdndow, namely at time t-d, will create a 
probabilistic match between the true signal and the predicted signal that relies 
on the maximum number of available observations (d) (See Figure 6- 2b.). 




Figure 6.2. Relationship between Estimated Signal and True Signal 

a. Probabilistic Mismatch 
b. Probabilistic Match 



189 



The proposed algorithm can be viewed in light of this interpretation 
as follows: at each time instant t we calculate d+2 likelihood terms Uitj), of all 
the possible realizations of {yj, each one of these realizations is associated 
with the assumption that the change occurred at the d + 1 possible locations 
within the window [t-d,t], and one corresponds to the "no change" 
hypothesis (see Figure 6.3). 



t-d t-di 



Figure 6.3. Calculating of d+2 Likelihood Terms 

The algorithm "looks" at all the possible realizations of {y\ within the 
window [t-d,t] and decides about a change in a way which will be described in 
the next section. 

3. Recursive Detection 

The recursive detection algorithm is based upon the likelihood terms 
as described in the last section. 

Define the MAP estimate of y as: 



t-2 M 



e 0 

t 



/i(e 2 ) 






/<(£*. i ) 



i,(e d ) 



-X 
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(6-7) 



y(t-d) = ar^max{p('/(/-dH Y |:Yl-,j-l)}- 

By standard Bayesian factorization, it is easy to see that 

P( Y U+1 :r(i- d )\yt-d-\: ) 

Since the rightmost term does not depend on yit-d), it is easy to see that 
maximization of (6-7) is equivalent to the maximization of 

y{i-d) = a^max{p(\ d+v y(* -dpt-d-l^t-d-l)} ( 6 " 8 ) 

The likelihood term in (6-8) can be recursively determined from the 
probability relation given by Lemma (6-3). 

In order to achieve the maximization efficiently, recall that y t ,d 
assumes only the realizations £y( j = - 1 , ..., d) since, at most one transition 
occurs within any interval [t-d,t], thus, the probability of "no change at ( t-d )" 
( yit-d ) = 0) is the union of all the possible mutually exclusive events of a 
change occurring at each of the other time instants within the window (j = 1, 
..., d- 1) including the event of no change (j = -1). 

The hypothesis of no change is given by 

d - 1 

H(6 P(Y ( , J+ „r(f-rf) = 0|Y,. i . 1 ,7,-i-l)= X'f(e,) (6-9) 

)=- 1 

and the hypothesis of change is given by 

H,: P(\d+1,r(t-d) = p,- i - U il l - d -i) = l l (e d ) (6-10) 

where 
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h ( ^ 7 ) — P{yi,d+\'Yt,d+\ ^jftt—d—l'Yt—d-l) 



j = -\,...,d. (6-11) 



Hence, the maximization (6-7) becomes hypothesis testing problem 



-1 * 

'*(«/) J h(*d) (6-12) 




where each likelihood term is given by (6-11) and calculated via the recursion 
(6-3). Equation (6-12) evaluates the likelihood /*(£<*) of a change at t-d 
against all the other possible changes within the window. Hence, the vector ey 
can also be viewed as an indicator vector of the change assumption (or re- 
initialization location of the Kalman filter). 

4. Implementation 

By careful examination of the recursion (6-3), it is clear that the 
denominator is constant with respect to the transition sequence y t ,d- 
Furthermore the right-most term in the numerator is given by (see Figure 



6 - 1 ) 



[l 

p (r(t)\Yt-u+V' y t-d-\) = \ p o 

h 



if Yf-l,d+l = e ; j * "I 
if Yt-u+i = e -i r(0 = ° 

if Y/-i,d+i = e -i 7(0 = 1 



Now, notice that the left-most term in the numerator of (6-3) is computed 
directly from the Kalman filter equations, since p|}’(/)|y,_i,y, ^ + j = Ejj is equal 

to P(y(r)|Y,_]) for j > 1, given that the filter was reinitialized at time t-j. Thus, 

the "past" sequence Y<_i is the "truncated past": (y(f-l) ... y(t-j)} and contains 
all the past observations since the filter's initialization. Hence, calculating 
(6-11) via (6-3) becomes: 
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/,(e ) ) = Cp(y(/)|Y ( . w )p(y(/)|T ( -,,*i;^-W tl )'<-,(e/-i) <6 - 13) 



i = \,...,d. 

where C is a constant independent of y t4- 

Using the realization map of {y\ (Figure 6.1), the update phase of the 
algorithm can be calculated as follows: 

Update: 



If 


f(t-d-l) = 0, then 




; = - 1: 


/ f (e_i) = CP(y(f) I Y m , y tr d+i = e_i) • p 0 ■ U- ite-i) 




) = 0: 


hi £o) = C-P(y(f) i Y M/ y^+i = Eo) • p-r /m(E-i) 


(6-14) 


for 


1 < j < d 






/f(e/) = C-P(y(t) | Yf_i, yt r d+i = cp • l t -\(cj-\) 




* If 


f(t-d-l) = 1, then 






//(e_i) = C-P(y(t) | Y M , y^ + i = £_,) • /,_i(ed) 


(6-15) 




7 

% 

>* 

o 

II 

& 





Change detection: 

at each time instant t check for: 

d -1 9(t - d) = 0 no change 

< h(*d) (6-16) 

j=-1 y(t-d) = 1 change 

The procedure introduced above can be represented on the graph shown in 
Figure 6.4. 

At each time instant t, the nodes marked as j = -1, 0 , d refer to 
the realizations of y t 4 +\ and the corresponding likelihood terms It (tj). These 
realizations of the likelihood terms are updated at each node according to 
equations (6-14) and (6-15). 
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Updating of these equations is done on the basis of the transition 
probability, i.e., the conditional probability terms P(y(f) | Y ( _i,) and is 
determined by the well known properties of the Kalman filter (Anderson and 
Moore, 1979) 

Let Xj be the state vector of the state-space model at node j. At 
each node j of the graph, we update the estimate of Xj, (i.e., xy) and 
consequently the probability terms are as follows: 

1 . if j = 1 , ..., d then 



2 . 



Time update (use estimate of filter j- 1): 



Xy(flf-l) = Axy.^f-llf-l) 

Vj(t\t - 1) = A Vy_! (f — Ilf — 1)A T +BQB t 
O bservation Update (Filtering): 

Xj{t\t) = Xj{t\t -l) + lj(t)[y(t)-c T Xj(t\t -l)\ 

1/(0 = Vy(flf - 1) • C • [c T Vy(flf - l)c + (J 2 ] 
Vy(/lf) = [l-l ; (Oc T ]Vy(flf-l). 



-1 



if j = 0, then: 



initialize filter. 
*o( ,lf ) = *-l 

V 0 (ill) = P-,. 



(6-17) 



(6-18) 



x_, and P_i being the initial state and initial filter error covariance 
matrix respectively. 
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3. if j = -1, then: 

x_i(f I f) is updated as in (6-17), (6-18) with the index change (see Figure 

6 . 1 ) 

. /.irt-J 5 -'*' -11 ' -1 ) tf f(*-^-n=o 

,l J l* 4 (t-u»-l) if y(f-d-l) = l. 

The state-space representation and the Kalman filter yields an efficient 
algorithm for the desired transition distribution P(y(f) |Y f-i/lM+l = £/ ), 

^)( y (')-c T */('l'- I )) 2 _ 

S j (f)=C T Vy(flf-l)c+<7 2 (6-19) 

namely, the desired distributions /*(/ ) are Gaussian with mean c T x ; (f I/-1) and 
covariance s ; (f). Hence, the likelihood terms (6-14) and (6-15) can be updated 
on-line by the following equation: 

1=0 

=cp,n/, i <j<i. 

i=0 

The presented algorithm lends itself to a parallel structure 
implementation. Furthermore, if the state-space matrices are not time 
dependent, the Kalman filter gains 1 j and covariance matrices V; can be 
precomputed, since, in this case the filter's performance is a priori known and 
not data dependent. Hence, lookup tables can be prepared resulting in a 
simple computational cost algorithm. 



1/2 

ft(i) = p (y(')l Vi/Ym+i = £ j) = ( 2 ™j( 0 ) ex P 
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D. RESULTS 



The algorithm was implemented and tested on different data structures as 
piecewise constant models, PSK models and AR models in different signal to 
noise ratios. The results obtained show that the transitions are estimated by 
the algorithm. 

Figure 6.5 illustrates the results obtained for detecting the transitions in 
piecewise constant signals represented by the state space model 

x(n + 1) = x(n) 

y(n) = x(n) + w(n) 




Figure 6.5. The Joint Detection-Estimation of a Piecewise Constant Signal. 

a. Noisy Data 
b. Filtered Data 



197 



Figure 6.6 illustrates the results obtained for an PSK signal with input SNR of 
of -3dB. Figure 6.7 illustrates the results obtained for an PSK signal with 
input SNR of -9dB. 




Figure 6.6. PSK Signal with Input SNR of -3dB, 
a. Noisy Data 
b. Filtered Data 
c. Estimated Transitions 
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b 




Figure 6.7. PSK Signal with Input SNR of -9dB, 

a. Noisy Data 

b. Filtered Data 

c. Estimated Transitions 
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Figure 6.8 illustrates data obtained by using an AR model. Figure 6.9 
illustrates the estimated transitions while Figures 6.10 and 6.11 illustrate the 
true and estimated AR parameters. 




Figure 6.8. The Original AR Data 
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Figure 6.9. Estimated Transitions 
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Figure 6.11. True X 2 and Estimated X 2 AR Parameter 
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E SUMMARY 

The problem of joint detection-estimation is addressed in this chapter. 
The state-space representation of the signal model allows joint detection 
estimation by using the Kalman filter properties. Furthermore, the detection 
is completely asynchronous. Since the algorithm is based on optimal 
estimation techniques, it is expected to be able to detect signals in the presence 
of very low SNR. However simulation results do not permit this type of 
conclusion. Detailed performance analysis of this algorithm is not available 
now and requires more research. 
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VII. CONCLUSIONS 



This dissertation investigated different types of disorder problems by 
using sequential procedures for on-line implementation. The problem was 
considered within the framework of detecting changes in statistical models of 
an observed random process when the disorder can occur at unknown times. 
The focus of this work was on quickest detection methods for cumsum 
procedures implemented for different parametric and nonparametric 
nonlinearities. In this context, several issues remain unresolved, namely, for 
a multiple disorder problem or for transient detection a critical issue is the 
joint estimation of disorder time and the model parameters. There is much 
more to do in investigating this problem by implementing recursive 
identification procedures together with detection procedures. 

In Chapter III, the concept of detecting energy changes in the Energy 
Spectral Density of a signal reflect different spectral signatures and is of 
interest in many applications. More work can still be done in the theoretical 
domain in order to examine the coupling effects between window sizes, 
averaging methods within a window with the root location and the minimal 
SNR needed for detection. Moreover, modern spectral energy estimators 
might be considered rather than the transitional periodogram. 

The detection algorithm which was presented in Chapter VI has an 
advantage of being noncoherent with respect to coherent detectors for PSK 
type signals. Even though the algorithm is optimal in the sense that optimal 
techniques (Kalman filtering) were used, there is still room for investigating 
its performance as a function of window length. Also, the merits of this 
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approach should be compared to traditional detection methods of PSK signals 
to check the tradeoff between noncoherent versus coherent detection. 

The disorder problem can be considered a local problem. Thus, 
conventional time frequency methods for detecting and estimating the 
change parameters have the problem of the tradeoff between the time- 
frequency resolution. It seems that the wavelet representation which has 
become popular recently has the potential to resolve this time-frequency 
resolution problem. 

Finally, the research can be extended to the situation where the 
measurements are dependent (Sadowsky, 1989) for providing a parallel 
framework for evaluating Page test performance. 
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APPENDIX BASIC CONCEPTS OF HYPOTHESIS TESTING AND 

DETECTION THEORY 
(FROM KASSAM, 1988) 

Let X = (X], X 2 , ..., X n ) be a random vector of observations with joint 
probability density function (pdf) P x (x I 0), where 6 is a parameter of the 
density function. Any specific realization x = (xi, X 2 , x n ) of X will be a point 
in SR", where SR is the set of all real numbers. In binary hypothesis-testing 
problems we have to decide between one of two hypotheses, which we will 
label as Ho and Hi about the pdf P x (x I 0), given an observation vector in SR”. 
Let 0 be the set of all possible values of 6; we usually identify H 0 with one 
subset 0h o of 6 values and H] with a disjoint subset 0h- [ , so that 0 = 6>// o u 
6>// r This may be expressed formally as 

Hq: X has pdf P x (xl0) with 6 e 0^ Q (A-l) 

Hy X has pdf P x (xl0) with 6 e 0 Hy (A -2) 

If 0h o and 0i f are made up of single elements, say 0u o and 0n v respectively, 
we say that the hypotheses are simple; otherwise the hypotheses are 
composite. If 0 can be viewed as a subset of SR r for a finite integer p, the pdf 
P x (x I 0) is completely specified by the finite number p of real components of 6, 
and we say that our hypotheses are parametric. 

A test for the hypothesis H 0 against H] may be specified as a partition of 
the same space S = SR" of observations into disjoint subsets Sh o and so 
that x falling in Sh 0 leads to acceptance of Ho, with H] accepted otherwise. 
This may also be expressed by a test function which is defined to have value 
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5(x) = 1 for x e S and value 5(x) = 0 for x e Sh g - The value of the test 
function is defined to be the probability with which the hypothesis Hi, the 
alternative hypothesis, is accepted. The hypothesis H o is called the null 
hypothesis. 

More generally, the test function can be allowed to take on probability 
values in the closed interval [0,1]. A test based on a test function taking on 
values inside [0,1] is called a randomized test. 

The power function T{6 \ 8) of a test based on a test function 8 is defined 
for 6 e &n o u 0/q as 

£p(6»l^) = E{6(x)l^} (A- 3) 

= J<S(x)P x (xl#)dx. 
ft” 

Thus it is the probability with which the test will accept the alternative 
hypothesis Hi for any particular parameter value 6. When 6 is in Sn o the 
value of 1\d\ 8) gives the probability of an error, that of accepting H\ when H o 
is correct. This is called a type I error or the probability of false alarm, and 
depends on the particular value of 6 in G[j o . The size of a test is the quantity 

a = sup :P(0I(5) (A -4) 

6e6 l}o 

which may be considered as being the best upper bound on the type I error 
probability of the test. 

Similarly, we define the Operating Characteristic (OC) of a test Q(0l 8), 
based on a test function 5, as the probability with which a test will accept the 
null hypothesis H o for any particular parameter value 6. When 6 is in &u q 
the value of Q(0l 8) gives the confidence (1 -a), that of accepting Hq when Hq 
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is correct. When 6 is in the value of Q(0l 5) gives the probability of miss 
(ft), that of accepting Ho when H] is correct. This is called a type II error and 
depends on the particular value of 6 in 0/^ Figure A.l illustrates a typical OC 
function of a test. 




Figure A.l. A Typical Operating Characteristic Function of a Test 

In signal detection the null hypothesis is often a noise-only hypothesis, 
and the alternative hypothesis expresses the presence of a signal in the 
observations. For a detector D implementing a test function <5>(x) the power 
function evaluated for any 6 in & gives a probability of detection of the 
signal. Thus, we will use the notation (PiO I D) for the power function of a 
detector D, and in discussing the probability of detection at a particular value 
of the parameter 6 in (or for a simple alternative hypothesis Hj) we will 
use for it the notation Pd- The size of a detector is often called its false-alarm 
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probability. This usage is encountered specifically when the noise-only null 
hypothesis is simple, and the notation for this probability is Pfa- 



A. MOST POWERFUL TESTS AND THE NEYMAN-PEARSON LEMMA 

Given a problem of binary hypothesis testing such as defined by (A-l) and 
(A-2), the question arises as to how one may define and then construct an 
optimum test. Ideally, one would like to have a test for which the power 
function 2KB 1 5) has values close to zero for B in and has values close to 
unity for 6 in These are, however, conflicting requirements. We can 

instead impose the condition that the size a of any acceptable test be no larger 
than some reasonable level ocq, and subject to this condition look for a test for 
which 2KB I 5), evaluated at a particular value 6 of 6 in has its largest 
possible value. Such a test is most powerful at level ocq in testing Hq against 
the simple alternative 6 = Bj^ in Oyiy its test function S*(x) satisfies 



for all other test function 8(x) of size less than or equal to ao ■ In most cases of 
interest a most powerful level ao test satisfies (A-5) with equality, so that its 
size is a = at). 

For a simple null hypothesis Ho when B = Bm q is the only parameter 
value in 0 /j o , the condition (A-5) becomes 2KB^ : I S*) < ao or Pfa - 0, subject to 
which Pd at B = 0// 1 is maximized. For this problem of testing a simple Ho 
against a simple Hi, a fundamental result of Neyman and Pearson (called the 



sup t(B\ 8 *)<ar\ 
8e6 Ho 



(A-5) 




(A -6) 
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Neyman-Pearson lemma) gives the structure of the most powerful test. We 
state the result here as a theorem: 

Theorem 1: Let <5(x) be a test function of a the form 



S(x) = 



1 

< r ( x ) 

0 



Px[x\o Hl )> tp x [x\e Ho ) 
p x (x\e H J = tP x (x\e Ho ) 



(A-7) 



for some constant t > 0 and some function r(x) taking on values in [0,1]. Then 
the resulting test is most powerful at level equal to its size for Ho'- 6 = 6 h 0 
versus Hi'. 6 = 9 h v 

In addition to the above sufficient condition for a most powerful test it 
can be shown that conversely, if a test is known to be most powerful at level 
equal to its size, then its test function must be of the form (A-7) except 
perhaps on a set of x values of probability measure zero. Additionally, we 
may always require r(x) in (A-7) to be a constant r in [0,1]. Finally, we note 
that we are always guaranteed the existence of such a test for Ho versus Hi, of 
given size a [Lehmann, 1959, Ch. 3]. 

From the above result we see that generally the structure of a most 
powerful test may be described as one comparing a likelihood ratio to 
constant threshold, 



P *( xle H<>) 



(^- 8 ) 



in deciding if the alternative Hi is to be accepted. If the likelihood ratio on the 
left-hand side of (A-8) equals the threshold value t, the alternative Hi may be 
accepted with some probability r (the randomization probability). The 
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constants t and r may be evaluated to obtain a desired size a using knowledge 
of the distribution function of the likelihood ratio under Hq. 

When the alternative hypothesis Hi is composite we may look for a test 
which is uniformly most powerful (UMP) in testing Hq against H\, that is, one 
which is most powerful for Hq against each 6= 6h 1 in &H r While UMP tests 
can be found in some cases, notably in many situations involving Gaussian 
noise in signal detection, such tests do not exist for many other problems of 
interest. One option in such situations is to place further restrictions on the 
class of acceptable or admissible tests in defining a most powerful test; for 
example, a requirement of unbiasedness or of invariance may be imposed 
[Lehmann, 1959, Ch. 4-6]. As an alternative, other performance criteria based 
on the power function may be employed. We will consider one such 
criterion, leading to locally optimum or locally most powerful tests for 
composite alternatives, in the next section. One approach to obtaining 
reasonable tests for composite hypotheses is to use maximum-likelihood 
estimates and 0^ of the parameter 6, obtained under the constraints of 
6 e ©h 0 and &Hi> respectively, in place of 6h 0 and 0/-^ in (A-8). The 
resulting test is called a generalized likelihood ratio (GLR) test or simply a 
likelihood ratio test. 

B. LOCAL OPTIMALITY AND THE GENERALIZED NEYMAN-PEARSON 

LEMMA 

Let us now consider the approach to construction of tests for composite 
alternative hypotheses. In this approach attention is concentrated on 
alternatives 0 = 0 in 0/^, which are close, in the sense of a metric or 
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distance, to the null-hypothesis parameter value B = Bh q . Specifically, let 6 be 
a real-valued parameter with value 9 - Bq defining the simple null hypothesis 
and let 6 > Bo define the composite alternative hypothesis. Consider the class 
of all tests based on test functions 8(x) of a particular desired size a for 6 = Bq 
against 6 > Bo, and assume that the power function T(B\ 5) of these tests are 
continuous and also continuously differentiable at B = Bq. Then if w r e are 
interested primarily in performance for alternatives which are close to the 
null hypothesis, we can use as a measure of performance the slope of the 
power function at B = Bo, that is 

i > '(e o i5)=r(ei5)| 9 , 0o ( a - 9 ) 

From among our class of tests of size a, the test based on S*(x) which 
uniquely maximizes 'P'iBo I B) has a power function satisfying 

T'(B\5*)>(P{B\8), Bq < B < B max (A - 10) 

for some B max > Bo- Such a test is called a locally most powerful or locally 
optimum (LO) test for B = flo against B > It is clearly of interest in situations 
such as the weak-signal case in signal detection, when the alternative- 
hypothesis parameter values of primary concern are those which define pdf's 
P x (x I B) close to the null-hypothesis noise-only pdf P x (x I 0/j Q ). 

The following generalization of the Neyman-Pearson fundamental result 
of Theorem 1 can be used to obtain the structure of an LO test: 
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Theorem 2: Let g(x) and h\(x), /^(x), h m (x) be real-valued and integrable 
functions defined on Let an integrable function <5(x) on 5\" have the 
characteristics 



m 

1 / s( x )> 

1=1 
m 

8(x) = \r(x) , ^(x)= £f,-ft,-(x) 

i=l 
m 

0 / g( x )<Z t i h t( x ) 

i=l 



(^-11) 



for a set of constants f; > 0, i = 1, 2, ..., m, and where 0 < r(x) < 1. Define, for 
i = 1, 2, ..., m, the quantities 

a, = J<5(x)/ij(x)dx. (A -12) 

9t n 



Then from within the class of all test functions satisfying the m constraints 
(A-12), the function <5(x) defined by (A-ll) maximizes J5(x)g(x)dx. 

A more complete version of the above theorem, and its proof, may be 
found in [Lehmann, 1959, Ch. 3); Ferguson [1967, Ch. 5] also discusses the use 
of this result. 

To use the above result in finding an LO test for 6 - 6q against 8 >6q 
defining Oh 0 and in (A-l) and (A-2), respectively, let us write (A-9) 
explicitly as 






d_ 

dd 



J<5(x)P x (xl0)ix 

9t n 



6=00 



= j$( x )-^ p x(*\O)\e=e 0 dx 

9t” 



(A -13) 
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assuming that our pdfs are such as to allow the interchange of the order in 
which limits and integration operations are performed. Taking m = 1 and 
identifying h-\(x) with P x (x I and g(x) with -~^Px(x\6^q = q q in Theorem 2, we 
are led to the locally optimum test which accepts the alternative H\\ 6> 6q 
when 



Mxl0 O ) 



>t 



(A - 14) 



where t is the test threshold value which results in a size*a test satisfying 

E{S(X)\H:6 = 6 0 } = a. (A - 15) 



The test of (A-14) may also be expressed as one accepting the alternative 
when 



^ln{P x (x!0)} 



0=00 



X. 



(A -16) 



Theorem 2 may also be used to obtain tests maximizing the second 
derivative T"(6o I $) at 0 = 0o- This would be appropriate to attempt if it so 
happens that ( P'{6q\ 8) = 0 for all size-a tests for a given problem. The 



condition T'(6q\ 8) = 0 will occur if 



c( xl 0 )|e=< 



is zero, assuming the 



d6 * v '\ a=e o 

requisite regularity conditions mentioned above. In this case Theorem 2 can 
be applied to obtain the locally optimum test accepting the alternative 
hypothesis Hy 6 > 6o when 



dd 



2 x 



( xl ^0= 



00 



P x ( X l0) 



> t. 



I?) 
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One type of problem for which Theorem 2 is useful in characterizing 
locally optimum tests is that of testing 8=8 o against the two-sided alternative 
hypothesis 6 * 8q. We have previously mentioned that one can impose the 
condition of unbiasedness on the allowable tests for a problem. Unbiasedness 
of a size-a test for the hypotheses Hq and H\ of (A-l) and (A-2) means that the 
test satisfies 



so that the detection probability for any e , is never less than the size 

a. For the two-sided alternative hypothesis 8 * 6$, suppose the pdf's P x (x I 6) 
are sufficiently regular so that the power functions of all tests are twice 
continuously differentiable at 6 = 6q. Then it follows that for any unbiased 
size-a test we will have T(6q\S) = a and ^{8q I 5) = 0. Thus, the test function 

of a locally optimum unbiased test can be characterized by using these two 
constraints and maximizing < P\6q\ 8) in Theorem 2. Another interpretation 
of the above approach for the two-sided alternative hypothesis is that the 
quantity co = ( 6-8q ) 2 may then be used as a measure of the distance of any 
alternative hypothesis from the null hypothesis 8 = 8q. We have 



T(8\5)<a, alldee Ho 



(A -18) 



'B(e\s)<a, all«€% 



(A -19) 




= \v(e 0 \s) 



(A - 20) 
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if (P(6q I 5) is zero, for sufficiently regular pdf's P x (x I 6). Thus if (P(6u I <5) is zero 
for a class of size-a tests, then maximization of tP\&o I <5) leads to a test which is 
locally optimum within that class. 

In this appendix we are concerned with problems where the noise density 
function P is completely specified, as a special case of the general parametric 
problem where P may have a finite number of unknown parameters (such as 
the noise variance). Our detection problem can be formulated as a statistical 
hypothesis-testing problem of choosing between a null hypothesis H o and an 
alternative hypothesis H] describing the joint density function Px of the 
observation vector X, with 

H 0 : P„(*)-nPM 
1 = 1 

n 

P„(x)=n p (»-, - &, ), s specified, any 6 > 0. (A - 22) 

i=l 

Here s is the vector (si,S 2 , ..., s„) of signal components. Note that we are 
considering parametric hypotheses which completely define Px to within a 
finite number of unknowm parameters (here with only 6 > 0 unknowm under 
the alternative hypothesis). Let us now proceed to obtain the structures of 
tests for Ho versus H\. 

C LOCALLY OPTIMUM DETECTION AND ASYMPTOTIC OPTIMALITY 

Since the alternative hypothesis H] is not a simple hypothesis, the signal 
amplitude value being unspecified, we cannot apply directly the fundamental 
lemma of Neyman and Pearson to obtain the structure of the optimum 
detector for the detection problem. For non-Gaussian noise densities it is also 
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generally impossible to obtain UMP tests for the composite alternative 
hypothesis H\. 

To illustrate the difficulty, consider the special case where P is specified to 
be the double-exponential noise density function defined by 

P(x) = ^e~ a ^,a> 0. (A -23) 

The likelihood ratio for testing Ho versus H i for a particular value 6 = 6 o > 0, 
is 



This now becomes 



wq = n 



1 = 1 



P{*i - 0QS») 

pM 



L(X) = e 



giving 

\nL(X) = af j (\x i \-\x i -e o s i \). 
i=l 

Thus for given 6 = do, the test based on 

*( X ) = X(N” |*i-0os«|) 

;=t 



(4-24) 



(A- 25) 



(4-26) 



(A - 27) 



is an optimum test, since the constant a is positive. The optimum detector 
therefore has a test function defined by 
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(A - 28) 



1 



w= 



r 



0 



A(X)>f 
X(X) = t 

X(x)<t 



where the threshold t and randomization probability r are chosen to obtain 
the desired value for the false-alarm probability so that the equation 

EfSMiH,}-?,:,, (A -29) 

is satisfied. Notice that we do not need randomization at A(X) = r if this 
event has zero probability under Ho- 

We can express A(X) of (2-19) in the form 

*(X) = £/(*,;%>,) (>1-30) 

1 = 1 

where the characteristic / is defined by 

/(i ; e s ) = W-|x-e s |. (A-3i) 

This is shown in Figure A. 2 as a function of x and depends strongly on 6, so 
that A(X) cannot be expressed in a simpler form decoupling ty) and the x,. For 
an implementation of the test statistic A(X) the value Oq of 6 must be known, 
and a UMP test does not exist for this problem for n > 1. 

One approach we might take in the above case is to use a generalized 
likelihood ratio (GLR) test, here obtained by using as the test statistic A(X) of 
(A-27) with 6q replaced by its maximum likelihood (ML) estimate under the 
alternative hypothesis H\. This maximum-likelihood estimate 6^i is given 
implicitly as the solution of the equation 
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(A -32) 



X s ! S 8 n ( x i - & ML «i) = o 
i=l 



where 



sgn(x) = < 



1 

0 

-1 



, x > 0 
, x = 0 
, x < 0 



(A -33) 



provided that the solution turns out to he non-negative; otherwise, 9ml = 0. 
Thus the implementation of the GLR test is not simple. In addition, the 
distribution of the GLR test statistic under the null hypothesis is not easily 
obtained. 



i(x;6s) 




Figure A.2. The Characteristic l(x;6 s ) of Equation A-31 



In the general case, for any noise density function P, the optimum detector for 
given 6 = 6 q > 0 under H] can be based on the test statistic 
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A(X) = lnL(X) 



£5 pM 



(A -34) 



which is of the form of A(X) of (A-30). But again, flo must be specified and the 
detector will be optimum only for a signal with that amplitude. The GLR 
detector can be obtained if the ML estimate O^l of 6 can be found under the 
constraint that &ML be non-negative. Once again, in general this will not lead 
to an easily implemented and easily analyzed system. 



D. LOCALLY OPTIMUM DETECTORS 

The above discussion shows that we have to search further in order to 
obtain reasonable schemes for detection of a known signal of unspecified 
amplitude in additive non-Gaussian noise. By a "reasonable" scheme we 
mean a detector that is practical to implement and relatively easy to analyze 
for performance, which should be acceptable for the anticipated range of input 
signal amplitudes. Fortunately, there is one performance criterion with 
respect to which it is possible to derive a simple and useful canonical 
structure for the optimum detector for our detection problem. This is the 
criterion of local detection power, and leads to detectors which are said to be 
locally optimum. 

A locally optimum (LO) or locally most powerful detector is one which 
maximizes the slope of the detector power function at the origin ( 6 = 0), from 
among the class of all detectors which have its false alarm probability. Let A a 
be the class of detectors of size a for Ho versus H]. In our notation any 
detector D in A a is based on a test function 5(X) for which 
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(A -35) 



Let 2fc(0 1 D) be the power function of detector D, that is, 

!P (i (0ID) = E{5(X)|H 1 }. (*-36) 



Formally, an LO detector Dio of size a is a detector in A a which satisfies 



d 

max — 
DeA a dd 






9=0 



d_ 

de 



?d(MD L o) 



9=0 



(A = 37) 



It would be appropriate to use a locally optimum detector when one is 
interested primarily in detecting weak signals, for which 6 under the 
alternative hypothesis H i remains close to zero. The idea is that an LO 
detector has a larger slope for its power function at 9 = 0 than any other 
detector D of the same size which is not an LO detector, and this will ensure 
that the power of the LO detector will be larger than that of the other detector 
at least for 6 in some non-null interval (O,0 max ), with 0 max depending on D. 
This is illustrated in Figure A. 3. Note that if an LO detector is not unique, 
then one may be better than another for 6 > 0. There is good reason to be 
concerned primarily with weak-signal detection. It is the weak signal that one 
has the most difficulty in detecting, whereas most ad hoc detection schemes 
should perform adequately for strong signals; after all, the detection 
probability is upper bounded by unity. 
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Figure A.3. Power Functions of Optimum and LO Detectors 



To obtain explicitly the canonical form of the LO detector, we can apply 
the generalized Neyman-Pearson lemma of Section A. 2. Now the power 
function of a detector D based on a test function <5(X) is 

(P d {e\ D)= J (A - 38) 

9?" 1=1 

where the integration is over the n-dimensional Euclidean space 9\”. The 
regularity Assumptions allow us to get 
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dx 



e=o 



j <S(x) 

9i n 



f s 

s ' ^«) 



iw* 



j=i 



= E 



«(X) 






u=i 



P'M 



(A - 39) 



from this it follows, from the generalized Neyman-Pearson lemma, that a 
locally optimum detector D/ 0 is based on the test statistic 



7 (X)-_fc P '( x i) 

" 0< ) S ’" pM 



~ SLo{ x i ) 
i=l 



where is the function defined by 



£&>(*) = - 



P'(x ) 



P(x) 



Note that we may express A/ 0 (X) as 



^o( x ) = Z-^ lnP ( x i-^s,) 



0=0 



— y ln P ( Xf _0S ') 
a a L 



P(*i) 



6=0 



(,4-40) 



(^-41) 



M “ 42) 



from which the LO detector test statistic (multiplied by 6) is seen to be a first- 
order approximation of the optimum detector test statistic given by A-34. 
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For the double-exponential noise density of (A-23) we find that g/ 0 is 
given by 

gl 0 {x) = asgn{x). (A -43) 

Note that the optimum detector for 6 = do in this case is based on the test 
statistic A(X) of (A-27). Similarly, for a zero mean Gaussian density with 
variance o 2 we have 

gM = ^2- (A-44) 
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